Compare commits

...

4132 Commits

Author SHA1 Message Date
b62ca59527 Release: v3.0.0 2020-06-29 10:40:13 -04:00
a316a6aaa8 [seq2seq docs] Move evaluation down, fix typo (#5365) 2020-06-29 10:36:04 -04:00
4bcc35cd69 [Docs] Benchmark docs (#5360)
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-29 16:08:57 +02:00
482c9178d3 Pin mecab for now (#5362) 2020-06-29 09:51:13 -04:00
2513fe0d02 added subtitle for recent contributors in readme (#5130) 2020-06-29 09:05:08 -04:00
30245c0c60 Fix table format fot test tesults (#5357) 2020-06-29 09:02:33 -04:00
c34010551a Create model card (#5356) 2020-06-29 09:01:55 -04:00
01aa0b8527 Create README.md (#5353) 2020-06-29 08:58:30 -04:00
96907367f1 arxiv-ai-gpt2 model card (#5337)
* Add model card and generation script for model arxiv_ai_gpt2

* Update arxiv-ai-gpt2 model card

Remove unnecessary lines

* Delete code in model cards
2020-06-29 08:53:20 -04:00
3cdf8b7ec2 Create model card for asafaya/bert-mini-arabic (#5352)
* Create README.md

* Update model_cards/asafaya/bert-mini-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 08:41:41 -04:00
9db1f41604 Create README.md (#5351) 2020-06-29 08:36:00 -04:00
c950fef545 [docs] Small tweaks to #5323 2020-06-29 14:24:33 +02:00
4544f906e2 model cards for roberta and bert-multilingual (#5324)
* More model cards (cc @myleott)

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 05:06:05 -04:00
92671532e7 More model cards 2020-06-29 10:58:54 +02:00
9209d36f93 Added a model card README.md for my pretrained model. (#5325)
* Create README.md

* Removed unnecessary link from README.md

* Update README.md
2020-06-29 16:29:14 +08:00
7cb52f53ef Fix LR decay in TF Trainer (#5269)
* Recover old PR

* Apply style

* Trigger CI
2020-06-29 14:38:32 +08:00
321c05abab Model cards for finance-koelectra models (#5313)
* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card
2020-06-29 13:47:44 +08:00
28a690a80e [mBART] skip broken forward pass test, stronger integration test (#5327) 2020-06-28 15:08:28 -04:00
45e26125de save_pretrained: mkdir(exist_ok=True) (#5258)
* all save_pretrained methods mkdir if not os.path.exists
2020-06-28 14:53:47 -04:00
12dfbd4f7a [examples] fix example links (#5344) 2020-06-28 12:54:54 -04:00
98109464c1 clean reformer reverse sort (#5343) 2020-06-28 14:32:25 +02:00
1af58c0706 New model sharing tutorial (#5323) 2020-06-27 11:10:02 -04:00
efae6645e2 Fix xxx_length behavior when using XLNet in pipeline (#5319) 2020-06-27 11:09:51 -04:00
393b8dc09a examples/seq2seq/run_eval.py fixes and docs (#5322) 2020-06-26 19:20:43 -04:00
5543b30aa6 [pl_examples] default warmup steps=0 (#5316) 2020-06-26 15:03:41 -04:00
bf0d12c220 CircleCI stores cleaner output at test_outputs.txt (#5291) 2020-06-26 13:59:31 -04:00
601d4d699c [tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
2020-06-26 19:48:14 +02:00
fd405e9a93 Add BART-base modeling and configuration (#5315) 2020-06-27 00:53:10 +08:00
798dbff6a7 [pipelines] Change summarization default to distilbart-cnn-12-6 (#5289) 2020-06-26 11:43:23 -04:00
834b6884c5 Add benchmark notebook (#5312)
* add notebook

* Créé avec Colaboratory

* move notebook to correct folder

* correct link

* correct filename

* correct filename

* better name
2020-06-26 17:38:13 +02:00
08c9607c3d [Generation] fix docs for decoder_input_ids (#5306)
* fix docs

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py
2020-06-26 16:58:11 +02:00
79a82cc06a [Benchmarks] improve Example Plotter (#5245)
* improve plotting

* better labels

* fix time plot
2020-06-26 15:00:14 +02:00
88d7f96e33 Gpt2 model card (#5283)
* Bert base model card

* Add metadata

* Adapt examples

* GPT2 model card

* Remove the BERT model card

* Change language code
2020-06-26 08:08:31 -04:00
fc5bce9e60 Bert base model card (#5276)
* Bert base model card

* Add metadata

* Adapt examples

* Comment on text generation

* Update model_cards/bert-base-uncased-README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-26 08:01:19 -04:00
135791e8ef Add pad_to_multiple_of on tokenizers (reimport) (#5054)
* Add new parameter `pad_to_multiple_of` on tokenizers.

* unittest for pad_to_multiple_of

* Add .name when logging enum.

* Fix missing .items() on dict in tests.

* Add special check + warning if the tokenizer doesn't have proper pad_token.

* Use the correct logger format specifier.

* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.

* Skip test if tokenizer doesn't have pad_token

* Fix RobertaTokenizer on empty input

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix and updating to simpler API

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-26 11:55:57 +02:00
7cc15bdd96 Closes #5218 2020-06-25 18:19:21 -04:00
2ffef0d0c7 Training & fine-tuning quickstart (#5034)
* add initial fine-tuning guide

* split code blocks to smaller segments

* fix up trianer section of fine-tune doc

* a few last typos

* Update usage -> task summary link

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 15:11:11 -06:00
364a5ae1f0 Refactor Code samples; Test code samples (#5036)
* Refactor code samples

* Test docstrings

* Style

* Tokenization examples

* Run rust of tests

* First step to testing source docs

* Style and BART comment

* Test the remainder of the code samples

* Style

* let to const

* Formatting fixes

* Ready for merge

* Fix fixture + Style

* Fix last tests

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing @sgugger's comments + Fix MobileBERT in TF

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 16:46:00 -04:00
315f464b0a [tokenizers] Several small improvements and bug fixes (#5287)
* avoid recursion in id checks for fast tokenizers

* better typings and fix #5232

* align slow and fast tokenizers behaviors for Roberta and GPT2

* style and quality

* fix tests - improve typings
2020-06-25 22:17:14 +02:00
24f46ea3f3 Remove links for all docs (#5280) 2020-06-25 11:45:05 -04:00
27cf1d97f0 [Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252)
* fix-5181

Padding to max sequence length while truncation to another length was wrong on slow tokenizers

* clean up and fix #5155

* fix XLM test

* Fix tests for Transfo-XL

* logging only above WARNING in tests

* switch slow tokenizers tests in @slow

* fix Marian truncation tokenization test

* style and quality

* make the test a lot faster by limiting the sequence length used in tests
2020-06-25 17:24:28 +02:00
e008d520bb [examples/seq2seq] more README improvements (#5274) 2020-06-25 10:13:01 -04:00
6a495cae00 [model_cards] Example of how to specify inputs for the widget 2020-06-25 15:58:25 +02:00
0e1fce3c01 Fix convert_graph_to_onnx (#5230) 2020-06-25 08:17:02 +02:00
5543efd5cc Create README.md (#5259) 2020-06-25 01:56:07 -04:00
40457bcebb examples/seq2seq supports translation (#5202) 2020-06-24 23:58:11 -04:00
d12ceb48ba Tokenization tutorial (#5257)
* All done

* Link to the tutorial

* Typo fixes

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-24 18:43:20 -04:00
7ac9110711 Add more tests on tokenizers serialization - fix bugs (#5056)
* update tests for fast tokenizers + fix small bug in saving/loading

* better tests on serialization

* fixing serialization

* comment cleanup
2020-06-24 21:53:08 +02:00
0148c262e7 Fix first test (#5255) 2020-06-24 15:16:04 -04:00
70c1e1d2d5 Use master _static (#5253)
* Use _static from master everywhere

* Copy to existing too
2020-06-24 15:06:14 -04:00
4965aee064 [HANS] Fix label_list for RoBERTa/BART (class flipping) (#5196)
* fix weirdness in roberta/bart for mnli trained checkpoints

* black compliance

* isort code check
2020-06-24 14:38:15 -04:00
fc24a93e64 [HfApi] Add support for pipeline_tag 2020-06-24 16:54:00 +00:00
0a3d0e02c5 Replace labels with -100 to skip loss calc (#4718) 2020-06-24 12:14:50 -04:00
6894b486d0 Fix version controller links (for realsies) (#5251) 2020-06-24 12:13:43 -04:00
1121ce9f98 Model cards for Hate-speech-CNERG models (#5236)
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card

* model cards for Hate-speech-CNERG models
2020-06-24 11:41:08 -04:00
cf10d4cfdd Cleaning TensorFlow models (#5229)
* Cleaning TensorFlow models

Update all classes


stylr

* Don't average loss
2020-06-24 11:37:20 -04:00
609e0c583f Fix links (#5248) 2020-06-24 11:35:55 -04:00
c9163a8d5a delay decay schedule until the end of warmup (#4940) 2020-06-24 11:18:29 -04:00
f216b60671 Fix deploy doc (#5246)
* Try with the same command

* Try like this
2020-06-24 10:59:06 -04:00
49f6e7a3c6 Add some prints to debug (#5244) 2020-06-24 10:37:01 -04:00
c2a26ec8a6 [Use cache] Align logic of use_cache with output_attentions and output_hidden_states (#5194)
* fix use cache

* add bart use cache

* fix bart

* finish bart
2020-06-24 16:09:17 +02:00
64c393ee74 Don't recreate old docs (#5243) 2020-06-24 09:59:07 -04:00
b29683736a fix print in benchmark (#5242) 2020-06-24 15:58:49 +02:00
9fe09cec76 [Benchmark] Extend Benchmark to all model type extensions (#5241)
* add benchmark for all kinds of models

* improved import

* delete bogus files

* make style
2020-06-24 15:11:42 +02:00
7c41057d50 Add hugs (#5225) 2020-06-24 07:56:14 -04:00
5e85b324ec Use the script in utils (#5224) 2020-06-24 07:55:58 -04:00
5e31a98ab7 Create README.md (#5108)
* Create README.md

* Update model_cards/a-ware/roberta-large-squad-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-24 04:45:51 -04:00
033124e5f8 Update README.md (#5199)
Fix/add information in README.md
2020-06-24 04:42:46 -04:00
7ca6627ec3 Create README.md (#5217)
electra_large_discriminator_squad2_512 Question Answering LM
2020-06-24 04:40:50 -04:00
54e9ce785d Fix PABEE division by zero error (#5233)
* Fix PABEE division by zero error

* patience=0 by default
2020-06-24 16:10:36 +08:00
9022ef021a Only put tensors on a device (#5223)
* Only put tensors on a device

* Type hint and unpack list comprehension
2020-06-23 17:30:17 -04:00
173528e368 Add version control menu (#5222)
* Add version control menu

* Constify things

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-23 17:05:12 -04:00
76e5af4cfd [pl_examples] revert deletion of optimizer_step (#5227) 2020-06-23 16:40:45 -04:00
c01480bba3 [file_utils] Type user-agent 2020-06-23 18:31:13 +02:00
58918c76f4 [bart] add config.extra_pos_embeddings to facilitate reuse (#5190) 2020-06-23 11:35:42 -04:00
b28b537131 More clear error message in the use-case of #5169 (#5184) 2020-06-23 13:37:29 +02:00
11fdde0271 Tokenizers API developments (#5103)
* Add return lengths

* make pad a bit more flexible so it can be used as collate_fn

* check all kwargs sent to encoding method are known

* fixing kwargs in encodings

* New AddedToken class in python

This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens.

* style and quality

* switched to hugginface tokenizers library for AddedTokens

* up to tokenizer 0.8.0-rc3 - update API to use AddedToken state

* style and quality

* do not raise an error on additional or unused kwargs for tokenize() but only a warning

* transfo-xl pretrained model requires torch

* Update src/transformers/tokenization_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-23 13:36:57 +02:00
1ae132a07d [Reformer] Axial Pos Emb Improve mem usage reformer (#5209)
* improve mem handling

* improve mem for pos ax encodings
2020-06-23 10:49:18 +02:00
5144104070 [fix] remove unused import (#5206) 2020-06-22 23:39:04 -04:00
0d158e38c9 [fix] mobilebert had wrong path, causing slow test failure (#5205) 2020-06-22 23:31:36 -04:00
f5c2a122e3 Upgrade examples to pl=0.8.1(#5146) 2020-06-22 20:40:10 -04:00
06b60c8b05 [Modelcard] bart-squadv2 (#5011)
* [Modelcard] bart-squadv2

* Update README.md

* Update README.md
2020-06-22 18:40:19 -04:00
35e0687256 Create README.md (#5013) 2020-06-22 18:40:00 -04:00
22d2c8ea2f Create README.md for finetuned BERT model (#5009)
* Create README.md

* changes in model usage section

* minor changes in output visualization

* minor errata in readme
2020-06-22 18:39:29 -04:00
2589505693 Add model card for StackOBERTflow-comments-small (#5008)
* Create README.md

* Update README.md
2020-06-22 18:39:22 -04:00
d8c26ed139 Specify dataset used for crossvalidation (#5175) 2020-06-22 18:26:12 -04:00
a34fb91d54 Create README.md (#5149) 2020-06-22 18:00:53 -04:00
ffabcf5249 Create README.md (#5160) 2020-06-22 17:59:54 -04:00
3363a19b12 Create README.md (#5152)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 17:59:33 -04:00
0cca61925c Add link to new comunity notebook (optimization) (#5195)
* Add link to new comunity notebook (optimization)

related to https://github.com/huggingface/transformers/issues/4842#event-3469184635

This notebook is about benchmarking model training with/without dynamic padding optimization. 
https://github.com/ELS-RD/transformers-notebook 

Using dynamic padding on MNLI provides a **4.7 times training time reduction**, with max pad length set to 512. The effect is strong because few examples are >> 400 tokens in this dataset. IRL, it will depend of the dataset, but it always bring improvement and, after more than 20 experiments listed in this [article](https://towardsdatascience.com/divide-hugging-face-transformers-training-time-by-2-or-more-21bf7129db9q-21bf7129db9e?source=friends_link&sk=10a45a0ace94b3255643d81b6475f409), it seems to not hurt performance.

Following advice from @patrickvonplaten I do the PR myself :-)

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 23:47:33 +02:00
1c5cd8e5f5 Add README.md (nyu-mll) (#5174)
* nyu-mll: roberta on smaller datasets

* Update README.md

* Update README.md

Co-authored-by: Alex Warstadt <alexwarstadt@gmail.com>
2020-06-22 17:24:27 -04:00
c439752482 Switch master/stable doc and add older releases (#5193) 2020-06-22 16:38:53 -04:00
417e492f1e Quick tour (#5145)
* Quicktour part 1

* Update

* All done

* Typos

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address comments in quick tour

* Update docs/source/quicktour.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update from feedback

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 16:08:09 -04:00
75e1eed8d1 Cleaner warning when loading pretrained models (#4557)
* Cleaner warning when loading pretrained models

This make more explicit logging messages when using the various `from_pretrained` methods. It also make these messages as `logging.warning` because it's a common source of silent mistakes.

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* style and quality

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 21:58:47 +02:00
4e741efa92 Have documentation fail on warning (#5189)
* Have documentation fail on warning

* Force ci failure

* Revert "Force ci failure"

This reverts commit f0a4666ec2eb4cd00a4da48af3357defc63324a0.
2020-06-22 15:49:50 -04:00
1262495a91 Add TF auto model to the docs + fix sphinx warnings (#5187) 2020-06-22 14:43:52 -04:00
88429c57bc Create README.md (#5165) 2020-06-22 13:49:14 -04:00
76ee9c8bc9 Create README.md (#5107)
* Create README.md

@julien-c check out that dataset meta tag is right

* Fix typo

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 13:47:30 -04:00
bf493d5569 Model card for t5-base-finetuned-emotion (recognition) (#5179) 2020-06-22 13:45:45 -04:00
e9ef21175e improve doc (#5185) 2020-06-22 19:00:11 +02:00
ebc36108dc [tokenizers] Fix #5081 and improve backward compatibility (#5125)
* fix #5081 and improve backward compatibility (slightly)

* add nlp to setup.cfg - style and quality

* align default to previous default

* remove test that doesn't generalize
2020-06-22 17:25:43 +02:00
d2a7c86dc3 Check if text is set to avoid IndexError (#4209)
Fix for https://github.com/huggingface/transformers/issues/3809
2020-06-22 11:09:05 -04:00
90f4b24520 Add support for gradient checkpointing in BERT (#4659)
* add support for gradient checkpointing in BERT

* fix unit tests

* isort

* black

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

* Revert "workaround for `torch.utils.checkpoint.checkpoint` not accepting bool"

This reverts commit 5eb68bb804f5ffbfc7ba13c45a47717f72d04574.

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 10:47:14 -04:00
f4e1f02210 Output hidden states (#4978)
* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Refactor output_hidden_states for mobilebert

* Reset and remerge to master

Co-authored-by: Joseph Liu <joseph.liu@coinflex.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-06-22 10:10:45 -04:00
866a8ccabb Add model cards for Microsoft's MiniLM (#5178)
* Add model cards for Microsoft's MiniLM

* XLMRobertaTokenizer

* format

* Add thumbnail

* finishing up
2020-06-22 21:48:14 +08:00
b99ad457f4 Added feature to move added tokens in vocabulary for Transformer-XL (#4953)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Fixed docstring, renamed sym to 	oken.

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-22 15:40:52 +02:00
eb0ca71ef6 Update glossary (#5148)
* Update glossary

* Update docs/source/glossary.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 08:30:49 -04:00
fa0be6d761 Benchmarks (#4912)
* finish benchmark

* fix isort

* fix setup cfg

* retab

* fix time measuring of tf graph mode

* fix tf cuda

* clean code

* better error message
2020-06-22 12:06:56 +02:00
18a0150bfa fix bart doc (#5132)
fix bart doc
2020-06-22 10:58:28 +02:00
3fe75c7f70 Fixing docs for Encoder Decoder Config (#5171) 2020-06-22 10:51:17 +02:00
59345cc87f Typo (#5147) 2020-06-22 10:49:23 +02:00
bc3a0c0607 [examples] fixes arguments for summarization finetune scripts (#5157)
Authored-by: i.boytsov <i.boytsov@MAC867.local>
2020-06-21 11:51:21 -04:00
68e19f1c22 Fix typo in root README (#5073) 2020-06-20 23:00:04 +08:00
c0c577cf8f Fix PABEE's result table (#5158) 2020-06-20 22:56:39 +08:00
aa6a29bc25 SummarizationPipeline: init required task name (#5086)
* SummarizationPipeline: init required task name

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-20 03:16:30 -04:00
2fd28d4363 Add BERT Loses Patience (Patience-based Early Exit) (#5078)
* Add BERT Loses Patience (Patience-based Early Exit)

* update model archive

* update format

* sort import

* flake8

* Add results

* full results

* align the table

* refactor to inherit

* default per gpu eval = 1

* Formatting

* Formatting

* isort

* modify readme

* Add check

* Fix format

* Fix format

* Doc strings

* ALBERT & BERT for sequence classification don't inherit from the original anymore

* Remove incorrect comments

* Remove incorrect comments

* Remove incorrect comments

* Sync up with new code

* Sync up with new code

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Finishing up!
2020-06-20 13:41:46 +08:00
f1679d7c48 Fix dropout in TFMobileBert (#5150) 2020-06-20 13:21:19 +08:00
5ed94b2312 Update note to avoid confusion (#5131) 2020-06-20 10:13:34 +08:00
d97b4176e5 Correct device assignment 2020-06-19 21:58:28 -04:00
9a3f91088c Add MobileBert (#4901)
* Add MobileBert

* Quality + Conversion script

* style

* Update src/transformers/modeling_mobilebert.py

* Links to S3

* Style

* TFMobileBert

Slight fixes to the pytorch MobileBert
Style

* MobileBertForMaskedLM (PT + TF)

* MobileBertForNextSentencePrediction (PT + TF)

* MobileFor{MultipleChoice, TokenClassification} (PT + TF)


ss

* Tests + Auto

* Doc

* Tests

* Addressing @sgugger's comments

* Adressing @patrickvonplaten's comments

* Style

* Style

* Integration test

* style

* Model card

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-19 16:38:36 -04:00
f45e873910 [bart-mnli] Fix class flipping bug (#5141) 2020-06-19 13:33:24 -04:00
e33929ef1e Fix in Reformer Config documentation (#5138) 2020-06-19 15:41:31 +02:00
84be482f66 AutoTokenizer supports mbart-large-en-ro (#5121) 2020-06-18 20:47:37 -04:00
2db1e2f415 [cleanup] remove redundant code in SummarizationDataset (#5119) 2020-06-18 20:34:48 -04:00
5f721ad6e4 Fix #5114 (#5122) 2020-06-18 19:20:04 -04:00
a258982af3 Add missing arg in 02-transformers notebook (#5085)
* Add missing arg when creating model

* Fix typos

* Remove from_tf flag when creating model
2020-06-18 19:04:04 -04:00
32e94cff64 tf add resize_token_embeddings method (#4351)
* resize token embeddings

* add tokens

* add tokens

* add tokens

* add t5 token method

* add t5 token method

* add t5 token method

* typo

* debugging input

* debugging input

* debug

* debug

* debug

* trying to set embedding tokens properly

* set embeddings for generation head too

* set embeddings for generation head too

* debugging

* debugging

* enable generation

* add base method

* add base method

* add base method

* return logits in the main call

* reverting to generation

* revert back

* set embeddings for the bert main layer

* description

* fix conflicts

* logging

* set base model as self

* refactor

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* v0

* v0

* finalize

* final

* black

* add tests

* revert back the emb call

* comments

* comments

* add the second test

* add vocab size condig

* add tf models

* add tf models. add common tests

* remove model specific embedding tests

* stylish

* remove files

* stylez

* Update src/transformers/modeling_tf_transfo_xl.py

change the error.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* adding unchanged weight test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 18:41:26 -04:00
973433260e Pin sphinx-rtd-theme (#5128) 2020-06-18 18:07:59 -04:00
8a377c3d6e [fix] Move _adjust_logits above postprocess to fix Marian.generate (#5126) 2020-06-18 18:06:27 -04:00
3d3e605aff [cleanup] generate_beam_search comments (#5115) 2020-06-18 16:30:24 -04:00
ca2d0f98c4 ElectraForMultipleChoice (#4954)
* add ElectraForMultipleChoice

* add  test_for_multiple_choice

* add ElectraForMultipleChoice in auto model

* add ElectraForMultipleChoice in all_model_classes

* add SequenceSummary related parameters

* get rid pooler, use SequenceSummary instead

* add electra multiple choice test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 14:59:35 -04:00
279d8e24f7 support local_files_only option for tf models (#5116) 2020-06-18 13:47:05 -04:00
355954ffca Create distilbert-base-uncased-distilled-squad-README.md 2020-06-18 05:17:45 -04:00
18177a1a60 lm_labels => labels (#5080) 2020-06-18 09:16:29 +02:00
efeb75b805 Remove misleading comment
closes #4958
2020-06-17 18:24:35 -04:00
bb154ac50c Fixing TPU training by disabling wandb.watch gradients logging for TPU (#4926) 2020-06-17 18:04:11 -04:00
fb6cccb863 fix qa example (#4929) 2020-06-17 17:54:16 -04:00
38bba9cdd5 Fix deprecation warnings due to invalid escape sequences. (#4924) 2020-06-17 17:46:58 -04:00
f1a3d03741 add pandas to setup.cfg (#5093) 2020-06-17 16:39:17 -04:00
90c833870c [MarianTokenizer] Switch to sacremoses for punc normalization (#5092) 2020-06-17 16:31:05 -04:00
049e14f0e3 very minor spelling correction in script command (#5090)
actual script name - counts_parameters.py
2020-06-17 16:08:43 -04:00
20fa828984 Make default_data_collator more flexible and deprecate old behavior (#5060)
* Make default_data_collator more flexible

* Accept tensors for all features

* Document code

* Refactor

* Formatting
2020-06-17 15:24:51 -04:00
5e06963394 Some changes to simplify the generation function (#5031)
* moving logits post-processing out of beam search

* moving logits post-processing out of beam search

* first step cache

* fix_Encoder_Decoder

* patrick_version_postprocess

* add_keyword_arg
2020-06-17 14:48:06 -04:00
204ebc25e6 Update installation page and add contributing to the doc (#5084)
* Update installation page and add contributing to the doc

* Remove mention of symlinks
2020-06-17 14:01:10 -04:00
043f9f51f9 [examples] SummarizationModule improvements (#4951) 2020-06-17 13:51:34 -04:00
cd40f6564e Add header and fix command (#5082) 2020-06-17 11:45:05 -04:00
70bc3ead4f [TextClassificationPipeline] Hotfix: make json serializable 2020-06-17 15:09:27 +00:00
7291ea0bff Reorganize documentation (#5064)
* Reorganize topics and add all models
2020-06-17 07:55:20 -04:00
e4aaa45805 Update pipeline examples to doctest syntax (#5030) 2020-06-16 18:14:58 -04:00
011cc0be51 Fix all sphynx warnings (#5068) 2020-06-16 16:50:02 -04:00
af497b5672 Typo (#5069) 2020-06-16 16:46:20 -04:00
49c5202522 Eli5 examples (#4968)
* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
c3e607496c [cleanup] examples test_run_squad uses tiny model (#5059) 2020-06-16 14:06:45 -04:00
439aa1d6e9 Remove old section + caching in install (#5027) 2020-06-16 13:03:41 -04:00
3d495c61ef Fix marian tokenizer save pretrained (#5043) 2020-06-16 09:48:19 -04:00
d5477baf7d Convert hans to Trainer (#5025)
* Convert hans to Trainer

* Tick box
2020-06-16 08:06:31 -04:00
c852036b4a [cleanup] Hoist ModelTester objects to top level (#4939)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-16 08:03:43 -04:00
0c55a384f8 Add reference to NLP dataset (#5028)
* Add reference to NLP dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:19:09 -04:00
0946d1209d Add reference to NLP (package) dataset (#5029)
* Add reference to NLP (package) dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:17:46 -04:00
edcb3ac59a refactor(wandb): consolidate import (#5044) 2020-06-16 03:40:43 -04:00
9e03364999 Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039)
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.

* Added __get_state__() & __set_state__() to be pickable.

* Correct tokens() return type from List[int] to List[str]

* Added unittest for BatchEncoding pickle/unpickle

* Added unittest for BatchEncoding is_fast

* More careful checking on BatchEncoding unpickle tests.

* Formatting.

* is_fast should assertTrue on Rust tokenizers.

* Ensure tensorflow has correct way of checking array_equal

* More formatting.
2020-06-16 09:25:25 +02:00
f9f8a5312e Add DistilBertForMultipleChoice (#5032)
* Add `DistilBertForMultipleChoice`
2020-06-15 18:31:41 -04:00
36434220fc [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
ebba39e4e1 [Bart] Question Answering Model is added to tests (#5024)
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
2020-06-15 22:50:09 +02:00
bbad4c6989 Add position_ids (#5021) 2020-06-15 15:50:17 -04:00
1bf4098e03 feat(TFTrainer): improve logging (#4946)
* feat(tftrainer): improve logging

* fix(trainer): consider case with evaluation only

* refactor(tftrainer): address comments

* refactor(tftrainer): move self.epoch_logging to __init__
2020-06-15 14:06:17 -04:00
7b5a1e7d51 Fix importing transformers on Windows (#4997) 2020-06-15 19:36:57 +02:00
a9f1fc6c94 Add bart-base (#5014) 2020-06-15 13:29:26 -04:00
7b685f5229 Increase pipeline support for ONNX export. (#5005)
* Increase pipeline support for ONNX export.

* Style.
2020-06-15 19:13:58 +02:00
1affde2f10 Make DataCollator a callable (#5015)
* Make DataCollator a callable

* Update src/transformers/data/data_collator.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
f7c93b3cee Possible fix to make AMP work with DDP in the trainer (#4728)
* manually set device in trainer args

* check if current device is cuda before set_device

* Explicitly set GPU ID when using single GPU

This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099
2020-06-15 10:10:26 -04:00
66bcfbb130 Create README.md (#4975)
* Create README.md

* Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 08:43:50 -04:00
d812e6d76e NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
ebab096e86 [model card] model card for bart-large-finetuned-squadv1 (#4977)
* [model card] model card for bart-large-finetuned-squadv1

* add metadata link to the dataset
2020-06-15 05:39:41 -04:00
9ad36ad57f Improve ONNX logging (#4999)
* Improve ONNX export logging to give more information about the generated graph.

* Correctly handle input and output in the logging.
2020-06-15 11:04:51 +02:00
9931f817b7 fix (#4976) 2020-06-14 21:36:14 +02:00
9208f57b16 BartTokenizerFast (#4878) 2020-06-14 13:04:49 -04:00
403d309857 Hans data (#4854)
* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
ca5e1cdf8e model_cards: we can now tag datasets
see corresponding model pages to see how it's rendered
2020-06-12 23:19:07 +02:00
e93ccb3290 BartForQuestionAnswering (#4908) 2020-06-12 15:47:57 -04:00
538531cde5 Add AlbertForMultipleChoice (#4959)
* Add AlbertForMultipleChoice

* Make up to date and add all models to common tests
2020-06-12 14:20:19 -04:00
fe24139702 Create README.md (#4865) 2020-06-12 09:03:43 -04:00
9aa219a1fe Create README.md (#4872) 2020-06-12 09:03:13 -04:00
86578bb04c [AutoModel] Split AutoModelWithLMHead into clm, mlm, encoder-decoder (#4933)
* first commit

* add new auto models

* better naming

* fix bert automodel

* fix automodel for pretraining

* add models to init

* fix name typo

* fix typo

* better naming

* future warning instead of depreciation warning
2020-06-12 10:01:49 +02:00
5620033115 [mbart] Fix fp16 testing logic (#4949) 2020-06-11 22:11:34 -04:00
473808da0d update mvmt-pruning/saving_prunebert (updating torch to 1.5) 2020-06-11 19:42:45 +00:00
caf3746678 fix indentation issue (#4941) 2020-06-11 21:28:01 +02:00
6293eb04df [Model card] model card for electra-base QA model (#4936) 2020-06-11 13:16:34 -04:00
08b59d10e5 MBartTokenizer:add language codes (#3776) 2020-06-11 13:02:33 -04:00
20451195f0 Support multiple choice in tf common model tests (#4920)
* Support multiple choice in tf common model tests

* Add the input_embeds test
2020-06-11 10:31:26 -04:00
699541c4b3 TFTrainer: Add dataloader_drop_last (#4925) 2020-06-11 02:11:22 -04:00
e80d6c689b Fix resize_token_embeddings for Transformer-XL (#4759)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-10 19:03:06 -04:00
d541938c48 Make multiple choice models work with input_embeds (#4921) 2020-06-10 18:38:34 -04:00
1e2631d6f8 Split LMBert model in two (#4874)
* Split LMBert model in two

* Fix example

* Remove lm_labels

* Adapt tests, refactor prepare_for_generation

* Fix merge

* Hide BeartLMHeadModel
2020-06-10 18:26:42 -04:00
f6da8b2200 check type before logging in trainer to ensure values are scalars (#4883)
* check type before logging to ensure it's a scalar

* log when Trainer attempts to add a non-scalar value using TensorboardX's writer.add_scalar so we know what kinds of fixes are appropriate

* black it

* rephrase log message to clarify attribute was dropped

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 18:25:55 -04:00
1c986f42ff Create README.md (#4871) 2020-06-10 17:29:41 -04:00
3ae2e86baf Run a single wandb instance per TPU run (#4851)
* Run a single wandb instance per TPU run

* wandb: self.is_world_master

* make style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 16:28:18 -04:00
466aa57a45 Don't init TPU device twice (#4916) 2020-06-10 15:53:15 -04:00
ef2dcdccaa ElectraForQuestionAnswering (#4913)
* ElectraForQuestionAnswering

* udate __init__

* add test for electra qa model

* add ElectraForQuestionAnswering in auto models

* add ElectraForQuestionAnswering in all_model_classes

* fix outputs, input_ids defaults to None

* add ElectraForQuestionAnswering in docs

* remove commented line
2020-06-10 15:17:52 -04:00
5d63ca6c38 [ctrl] fix pruning of MultiHeadAttention (#4904) 2020-06-10 14:06:55 -04:00
4e10acb3e5 Add more models to common tests (#4910) 2020-06-10 13:19:53 -04:00
3b3619a327 [All models] fix docs after adding output attentions to all forward functions (#4909)
* fix doc

* add format file

* add output attentions to all docs

* add also for bart

* fix naming

* re-add doc to config
2020-06-10 18:10:59 +02:00
ac99217e92 Fix the CI (#4903)
* Fix CI
2020-06-10 09:26:06 -04:00
0a375f5abd Deal with multiple choice in common tests (#4886)
* Deal with multiple choice in common tests
2020-06-10 08:10:20 -04:00
e8db8b845a Remove unused arguments in Multiple Choice example (#4853)
* Remove unused arguments

* Formatting

* Remove second todo comment
2020-06-09 20:05:09 -04:00
29c36e9f36 run_pplm.py bug fix (#4867)
`is_leaf` may become `False` after `.to(device=device)` function call.
2020-06-09 19:14:27 -04:00
13aa174112 uninstalled wandb raises AttributeError 2020-06-09 18:50:56 -04:00
6e603cb789 [All models] Extend config.output_attentions with output_attentions function arguments (#4538)
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-09 23:39:06 +02:00
f90bc44d9a [examples] Cleanup summarization docs (#4876) 2020-06-09 17:38:28 -04:00
2cfb947f59 [Benchmark] add tpu and torchscipt for benchmark (#4850)
* add tpu and torchscipt for benchmark

* fix name in tests

* "fix email"

* make style

* better log message for tpu

* add more print and info for tpu

* allow possibility to print tpu metrics

* correct cpu usage

* fix test for non-install

* remove bugus file

* include psutil in testing

* run a couple of times before tracing in torchscript

* do not allow tpu memory tracing for now

* make style

* add torchscript to env

* better name for torch tpu

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2020-06-09 23:12:43 +02:00
f0340b3031 Removes from the of the parent of TFRobertaClassificationHead (#4884)
Co-authored-by: Hamza Harkous <harkous@google.com>
2020-06-09 16:14:01 -04:00
02e5f79662 [examples] consolidate summarization examples (#4837) 2020-06-09 11:14:12 -04:00
9f5d5a531d Fix the __getattr__ method in BatchEncoding (#4772) 2020-06-09 09:44:00 +02:00
41a1d27cde Add XLMRobertaForQuestionAnswering (#4855)
* Add XLMRobertaForQuestionAnswering

* Formatting

* Make test happy
2020-06-08 21:22:37 -04:00
a139d1a160 [cleanup] consolidate some prune_heads logic (#4799) 2020-06-08 17:08:04 -04:00
4c7f564f9a fix (#4839) 2020-06-08 18:28:50 +02:00
37be3786cf Clean documentation (#4849)
* Clean documentation
2020-06-08 11:28:19 -04:00
42860e92a4 Turn off codecov patch for now 2020-06-08 09:47:13 -04:00
36dfc317b3 TF Checkpoints (#4831)
* Align checkpoint dir with the PT trainer

* Use args for max to keep checkpoints
2020-06-08 09:45:23 -04:00
439f1cab20 [Generate] beam search should generate without replacement (#4845)
* fix flaky beam search

* fix typo
2020-06-08 15:31:32 +02:00
c0554776de fix PR (#4810) 2020-06-08 15:31:12 +02:00
e817747941 Expose classes used in documentation (#4808)
* Expose classes used in documentation

* Format code
2020-06-08 08:14:32 -04:00
b6f365a8ed Updates args in tf squad example. (#4820)
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
2020-06-08 05:36:09 -04:00
e33fdc93b4 Export PretrainedBartModel from __init__ (#4819) 2020-06-07 11:55:10 -04:00
c58e6c129a [marian tests ] pass device to pipeline (#4815) 2020-06-06 00:52:17 -04:00
ddf9a3dfc7 Updated path "cd examples/text-generation/pplm" (#4778)
https://github.com/huggingface/transformers/issues/4776
2020-06-05 21:16:48 -04:00
2d372a990b Explain how to preview the docs in a PR (#4795) 2020-06-05 20:47:02 -04:00
56d5d160cd Add model and doc badges (#4811)
* Add badges for models and docs
2020-06-05 18:45:42 -04:00
4ab7424597 [cleanup/marian] pipelines test and new kwarg (#4812) 2020-06-05 18:45:19 -04:00
875288b344 [isort] add matplotlib to known 3rd party dependencies (#4800) 2020-06-05 17:27:31 -04:00
8cca875569 [EncoderDecoderConfig] automatically set decoder config to decoder (#4809)
* automatically set decoder config to decoder

* add more tests
2020-06-05 23:16:37 +02:00
f1fe18465d Use labels to remove deprecation warnings (#4807) 2020-06-05 16:41:46 -04:00
5c0cfc2cf0 Add link to community models (#4804) 2020-06-05 15:29:20 -04:00
4dd5cf2207 Fix argument label (#4792)
* Fix argument label

* Fix test
2020-06-05 15:20:29 -04:00
3723f30a18 [cleanup] MarianTokenizer: delete unused constants (#4802) 2020-06-05 14:57:24 -04:00
acaa2e6267 Clean-up code (#4790) 2020-06-05 12:36:22 -04:00
fa661ce749 Add model summary (#4789)
* Add model summary

* Add link to pretrained models
2020-06-05 12:22:50 -04:00
79ab881eb1 No silent error when d_head already in the configuration (#4747)
* No silent error when d_head already in the configuration

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-05 12:01:43 -04:00
b9109f2de1 [doc] Make it clearer that text-generation does not involve training 2020-06-05 14:59:22 +02:00
ceaab8dd22 Add .vs to gitignore (#4774) 2020-06-05 07:56:11 -04:00
f9414f7553 Tensorflow improvements (#4530)
* Better None gradients handling

* Apply Style

* Apply Style

* Create a loss class per task to compute its respective loss

* Add loss classes to the ALBERT TF models

* Add loss classes to the BERT TF models

* Add question answering and multiple choice to TF Camembert

* Remove prints

* Add multiple choice model to TF DistilBERT + loss computation

* Add question answering model to TF Electra + loss computation

* Add token classification, question answering and multiple choice models to TF Flaubert

* Add multiple choice model to TF Roberta + loss computation

* Add multiple choice model to TF XLM + loss computation

* Add multiple choice and question answering models to TF XLM-Roberta

* Add multiple choice model to TF XLNet + loss computation

* Remove unused parameters

* Add task loss classes

* Reorder TF imports + add new model classes

* Add new model classes

* Bugfix in TF T5 model

* Bugfix for TF T5 tests

* Bugfix in TF T5 model

* Fix TF T5 model tests

* Fix T5 tests + some renaming

* Fix inheritance issue in the AutoX tests

* Add tests for TF Flaubert and TF XLM Roberta

* Add tests for TF Flaubert and TF XLM Roberta

* Remove unused piece of code in the TF trainer

* bugfix and remove unused code

* Bugfix for TF 2.2

* Apply Style

* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name

* Apply style

* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling

* Fix TF optimizations tests and apply style

* Remove useless parameter

* Bugfix and apply style

* Fix TF Trainer prediction

* Now the TF models return the loss such as their PyTorch couterparts

* Apply Style

* Ignore some tests output

* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.

* Fix names for SQuAD data

* Apply Style

* Fix conflicts with 2.11 release

* Fix conflicts with 2.11

* Fix wrongname

* Add better documentation on the new create_optimizer function

* Fix isort

* logging_dir: use same default as PyTorch

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-04 19:45:53 -04:00
ccd26c2862 Create model card for tblard/allocine (#4775)
https://huggingface.co/tblard/tf-allocine
2020-06-04 19:15:07 -04:00
2a4b9e09c0 NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
0e1869cc28 Add drop_last arg for data loader 2020-06-04 18:30:31 -04:00
48a05026de removed deprecared use of Variable api from pplm example 2020-06-04 18:07:49 -04:00
12d0eb5f3e Don't access pad_token_id if there is no pad_token (#4773) 2020-06-04 17:57:04 -04:00
17a88d3192 Create model card for T5-base fine-tuned for Sentiment Span Extraction (#4737) 2020-06-04 16:59:56 -04:00
fb52143cf6 Create README.md (#4743) 2020-06-04 16:59:37 -04:00
5f077a3445 Model Card for RoBERTa trained on Sanskrit (#4763)
* Model cad for SanBERTa

Model Card for RoBERTa trained on Sanskrit

* Model card for SanBERTa

model card for RoBERTa trained on Sanskrit
2020-06-04 16:58:40 -04:00
cd4e07a85e Add note about doc generation (#4770) 2020-06-04 13:43:14 -04:00
492b352ab6 Remove unnecessary model_type arg in example (#4771) 2020-06-04 13:41:24 -04:00
e645b9ab94 Codecov setup (#4768)
* Codecov setup

* Understanding codecov
2020-06-04 11:44:38 -04:00
2b8b6c929e [cleanup] PretrainedModel.generate: remove unused kwargs (#4761) 2020-06-04 08:13:52 -04:00
5bf9afbf35 Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585)
* Refactor tensor creation in tokenizers.

* Make sure to convert string to TensorType

* Refactor convert_to_tensors_

* Introduce numpy tensor creation

* Format

* Add unittest for TensorType creation from str

* sorting imports

* Added unittests for numpy tensor conversion.

* Do not use in-place version for squeeze as numpy doesn't provide such feature.

* Added extra parameter prepend_batch_axis: bool on prepare_for_model.

* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.

* style.

* numpy tests require_torch for now while flax not merged.

* Hopefully will make flake8 happy.

* One more time 🎶
2020-06-04 06:57:01 +02:00
efae154929 never_split on slow tokenizers should not split (#4723)
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.

* never_split only use membership attempt to use a set() which is 10x faster for this operation.

* Use union to concatenate two sets.

* Updated docstring for never_split parameter.

* Avoid set.union() if never_split is None

* Added comments.

* Correct docstring format.
2020-06-03 16:48:28 -04:00
2e4de76231 Update encode documentation (#4751) 2020-06-03 16:30:59 -04:00
ed4df85572 fix beam search bug in tf as well (#4745) 2020-06-03 12:53:23 -04:00
1b5820a565 Unify label args (#4722)
* Deprecate masked_lm_labels argument

* Apply to all models

* Better error message
2020-06-03 09:36:26 -04:00
3e5928c57d Adding notebooks for Fine Tuning [Community Notebook] (#4732)
* Added links to more community notebooks

Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials
Different Transformers models are fine tuned on Dataset using PyTorch

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-03 11:07:26 +02:00
99207bd112 Pipelines: miscellanea of QoL improvements and small features... (#4632)
* [hf_api] Attach all unknown attributes for future-proof compatibility

* [Pipeline] NerPipeline is really a TokenClassificationPipeline

* modelcard.py: I don't think we need to force the download

* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer

* FillMaskPipeline: also output token in string form

* TextClassificationPipeline: option to return all scores, not just the argmax

* Update docs/source/main_classes/pipelines.rst
2020-06-03 03:51:31 -04:00
8ed47aa10b bert-small-cord19 model cards (#4730)
* Create README.md

* Create README.md

* Create README.md
2020-06-03 03:40:14 -04:00
9ca485734a [Reformer] Improved memory if input is shorter than chunk length (#4720)
* improve handling of short inputs for reformer

* correct typo in assert statement

* fix other tests
2020-06-02 23:08:39 +02:00
b231a413f5 Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621)
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-02 13:40:14 -04:00
70f7423436 TFRobertaModelIntegrationTest requires tf (#4726) 2020-06-02 12:59:00 -04:00
d976ef262e Repin versions 2020-06-02 10:27:15 -04:00
b42586ea56 Fix CI after killing archive maps (#4724)
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
b43c78e5d3 Release: v2.11.0 2020-06-02 09:49:09 -04:00
d4c2cb402d Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
47a551d17b [pipeline] Tokenizer should not add special tokens for text generation (#4686)
* allow to not add special tokens

* remove print
2020-06-02 11:03:46 +02:00
f6d5046af1 Override get_vocab for fast tokenizer. (#4717) 2020-06-02 11:02:27 +02:00
88762a2f8c Specify PyTorch versions for examples (#4710) 2020-06-02 04:29:28 -04:00
d3ef14f931 Add community notebook for sentiment span extraction (#4700) 2020-06-02 09:59:53 +02:00
7677936316 Make docstring match args (#4711) 2020-06-01 15:22:51 -04:00
6449c494d0 close #4685 2020-06-01 12:57:52 -04:00
ec8717d5d8 [config] Ensure that id2label always takes precedence over num_labels 2020-06-01 16:54:55 +02:00
751a1e0890 [config] Ensure that id2label always takes precedence over num_labels
Fixes bug reported in https://github.com/huggingface/transformers/issues/4669

See #3967 for context
2020-06-01 16:25:56 +02:00
ec62b7d953 Fix onnx export input names order (#4641)
* pass on tokenizer to pipeline

* order input names when convert to onnx

* update style

* remove unused imports

* make ordered inputs list needs to be mutable

* add test custom bert model

* remove unused imports
2020-06-01 16:12:48 +02:00
bf760c80b5 finish README 2020-06-01 09:23:31 -04:00
9d7d9b3ae0 weird import 2020-06-01 09:23:31 -04:00
2a3c88a659 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
4ac462bfb8 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
35fa0bbca0 clarify README 2020-06-01 09:23:31 -04:00
cc746a5020 flake8 compliance 2020-06-01 09:23:31 -04:00
b11386e158 less prints in saving prunebert 2020-06-01 09:23:31 -04:00
8b5d4003ab complete README 2020-06-01 09:23:31 -04:00
5c8e5b3709 commplying with isort 2020-06-01 09:23:31 -04:00
db2a3b2e01 space 2020-06-01 09:23:31 -04:00
5f8f2d849a add floppy bert model notebok 2020-06-01 09:23:31 -04:00
b41948f5cd add requirements 2020-06-01 09:23:31 -04:00
fb8f4277b2 add scripts 2020-06-01 09:23:31 -04:00
d489a6d3d5 add masked_run_* 2020-06-01 09:23:31 -04:00
e4c07faf0a add sparsity modules 2020-06-01 09:23:31 -04:00
667003e447 Create README.md (#4665) 2020-06-01 08:29:09 -04:00
ed23f5909e HooshvareLab readme parsbert-armananer (#4666)
Readme for HooshvareLab/bert-base-parsbert-armananer-uncased
2020-06-01 08:28:43 -04:00
3750b9b0b0 HooshvareLab readme parsbert-peymaner (#4667)
Readme for HooshvareLab/bert-base-parsbert-peymaner-uncased
2020-06-01 08:28:25 -04:00
036c2c6b02 Update HooshvareLab/bert-base-parsbert-uncased (#4687)
mBERT results added regarding NER datasets!
2020-06-01 08:27:00 -04:00
74872c19d3 Create README.md (#4684) 2020-06-01 05:45:54 -04:00
0866669e75 [EncoderDecoder] Fix initialization and save/load bug (#4680)
* fix bug

* add more tests
2020-05-30 01:25:19 +02:00
6f82aea66b Include nlp notebook for model evaluation (#4676) 2020-05-29 19:38:56 +02:00
33b7532e69 Fix longformer attention mask type casting when using apex (#4574)
* Fix longformer attention mask casting when using apex

* remove extra type casting
2020-05-29 18:13:30 +02:00
56ee2560be [Longformer] Better handling of global attention mask vs local attention mask (#4672)
* better api

* improve automatic setting of global attention mask

* fix longformer bug

* fix global attention mask in test

* fix global attn mask flatten

* fix slow tests

* update docstring

* update docs and make more robust

* improve attention mask
2020-05-29 17:58:42 +02:00
e2230ba77b Fix BERT example code for NSP and Multiple Choice (#3953)
Change the example code to use encode_plus since the token_type_id
wasn't being correctly set.
2020-05-29 11:55:55 -04:00
3a5d1ea2a5 Fix two bugs: 1. Index of test data of SST-2. 2. Label index of MNLI data. (#4546) 2020-05-29 11:12:24 -04:00
9c17256447 [Longformer] Multiple choice for longformer (#4645)
* add multiple choice for longformer

* add models to docs

* adapt docstring

* add test to longformer

* add longformer for mc in init and modeling auto

* fix tests
2020-05-29 13:46:08 +02:00
91487cbb8e [Longformer] fix model name in examples (#4653)
* fix longformer model names in examples

* a better name for the notebook
2020-05-29 13:12:35 +02:00
b5015a2a0f gpt2 typo (#4629)
* gpt2 typo

* Add files via upload
2020-05-28 16:44:43 -04:00
fe5cb1a1c8 Adding community notebook (#4642)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 22:35:15 +02:00
aecaaf73a4 [Community notebooks] add longformer-for-qa notebook (#4652) 2020-05-28 22:27:22 +02:00
5e737018e1 Fix add_special_tokens on fast tokenizers (#4531) 2020-05-28 10:54:45 -04:00
e444648a30 LongformerForTokenClassification (#4638) 2020-05-28 12:48:18 +02:00
3cc2c2a150 add 2 colab notebooks (#4505)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 11:18:16 +02:00
ef03ae874f [Longformer] more models + model cards (#4628)
* adding freeze roberta models

* model cards

* lint
2020-05-28 11:11:05 +02:00
96f57c9ccb [Benchmark] Memory benchmark utils (#4198)
* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-05-27 23:22:16 +02:00
ec4cdfdd05 LongformerForSequenceClassification (#4580)
* LongformerForSequenceClassification

* better naming x=>hidden_states, fix typo in doc

* Update src/transformers/modeling_longformer.py

* Update src/transformers/modeling_longformer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-27 22:30:00 +02:00
4402879ee4 [Model Card] model card for longformer-base-4096-finetuned-squadv1 (#4625) 2020-05-27 18:48:03 +02:00
6a17688021 per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
1381b6d01d README for HooshvareLab (#4610)
HooshvareLab/bert-base-parsbert-uncased
2020-05-27 11:25:36 -04:00
5acb4edf25 Update version command when contributing (#4614) 2020-05-27 17:19:11 +02:00
842588c12f uncased readme (#4608)
Co-authored-by: kldarek <darekmail>
2020-05-27 09:50:04 -04:00
ac1a612179 Create README.md (#4607)
Model card for cased model
2020-05-27 09:36:20 -04:00
07797c4da4 [testing] LanguageModelGenerationTests require_tf or require_torch (#4616) 2020-05-27 09:10:26 -04:00
a9aa7456ac Add back --do_lower_case to uncased models (#4245)
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).

Results:
BERT-BASE without --do_lower_case:  'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case:  'exact': 81.02, 'f1': 88.34
2020-05-26 21:13:07 -04:00
a801c7fd74 Creating a readme for ALBERT in Mongolian (#4603)
Here I am uploading Mongolian masked language model (ALBERT) on your platform.
https://en.wikipedia.org/wiki/Mongolia
2020-05-26 16:54:42 -04:00
6458c0e268 updated model cards for both models at aubmindlab (#4604)
* updated aubmindlab/bert-base-arabert/ Model card

* updated aubmindlab/bert-base-arabertv01 model card
2020-05-26 16:52:43 -04:00
ea4e7a53fa Improve model card for Tereveni-AI/gpt2-124M-uk-fiction (#4582)
Add language metadata, training and evaluation corpora details.
Add example output. Fix inconsistent use of quotes.
2020-05-26 16:51:40 -04:00
937930dcae Create README.md (#4591) 2020-05-26 16:50:08 -04:00
bac1cc4dc1 Remove MD emojis (#4602) 2020-05-26 16:38:39 -04:00
003c477129 [GPT2, CTRL] Allow input of input_ids and past of variable length (#4581)
* revert convenience  method

* clean docs a bit
2020-05-26 19:43:58 +02:00
5ddd8d6531 Add BART fine-tuning summarization community notebook (#4539)
* adding BART summarization how-to community notebook

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-26 16:43:41 +02:00
8cc6807e89 Make transformers-cli cross-platform (#4131)
* make transformers-cli cross-platform

Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)

* make style & quality
2020-05-26 10:00:51 -04:00
c589eae2b8 [Longformer For Question Answering] Conversion script, doc, small fixes (#4593)
* add new longformer for question answering model

* add new config as well

* fix links

* fix links part 2
2020-05-26 14:58:47 +02:00
a163c9ca5b [T5] Fix Cross Attention position bias (#4499)
* fix

* fix1
2020-05-26 08:57:24 -04:00
1d69028989 fix (#4410) 2020-05-26 08:51:28 -04:00
b86e42e0ac [ci] fix 3 remaining slow GPU failures (#4584) 2020-05-25 19:20:50 -04:00
365d452d4d [ci] Slow GPU tests run daily (#4465) 2020-05-25 17:28:02 -04:00
3e3e552125 [Reformer] fix reformer num buckets (#4564)
* fix reformer num buckets

* fix

* adapt docs

* set num buckets in config
2020-05-25 16:04:45 -04:00
3dea40b858 fixing tokenization of extra_id symbols in T5Tokenizer. Related to issue 4021 (#4353) 2020-05-25 16:04:30 -04:00
5139733623 LongformerTokenizerFast (#4547) 2020-05-25 16:03:55 -04:00
c9c385c522 Updated the link to the paper (#4570)
I looks like the conference has changed the link to the paper.
2020-05-25 15:29:50 -04:00
adab7f8332 Add nn.Module as superclass (#4533) 2020-05-25 15:29:33 -04:00
8f7c1c7672 Create model card (#4578) 2020-05-25 15:28:30 -04:00
4c6b218056 Update README.md (#4556) 2020-05-25 15:12:23 -04:00
50d1ce411f add DistilBERT to supported models (#4558) 2020-05-25 14:50:45 -04:00
03d8527de0 Longformer for question answering (#4500)
* added LongformerForQuestionAnswering

* add LongformerForQuestionAnswering

* fix import for LongformerForMaskedLM

* add LongformerForQuestionAnswering

* hardcoded sep_token_id

* compute attention_mask if not provided

* combine global_attention_mask with attention_mask when provided

* update example in  docstring

* add assert error messages, better attention combine

* add test for longformerForQuestionAnswering

* typo

* cast gloabl_attention_mask to long

* make style

* Update src/transformers/configuration_longformer.py

* Update src/transformers/configuration_longformer.py

* fix the code quality

* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-25 18:43:36 +02:00
a34a9896ac DOC: Fix typos in modeling_auto (#4534) 2020-05-23 09:40:59 -04:00
e19b978151 Add Type Hints to modeling_utils.py Closes #3911 (#3948)
* Add Type Hints to modeling_utils.py Closes #3911

Add Type Hints to methods in `modeling_utils.py`

Note: The coverage isn't 100%. Mostly skipped internal methods.

* Reformat according to `black` and `isort`

* Use typing.Iterable instead of Sequence

* Parameterize Iterable by its generic type

* Use typing.Optional when None is the default value

* Adhere to style guideline

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-22 19:10:22 -04:00
996f393a86 Warn the user about max_len being on the path to be deprecated. (#4528)
* Warn the user about max_len being on the path to be deprecated.

* Ensure better backward compatibility when max_len is provided to a tokenizer.

* Make sure to override the parameter and not the actual instance value.

* Format & quality
2020-05-22 18:08:30 -04:00
0f6969b7e9 Better github link for Reformer Colab Notebook 2020-05-22 23:51:36 +02:00
ab44630db2 [Summarization Pipeline]: Fix default tokenizer (#4506)
* Fix pipelines defaults bug

* one liner

* style
2020-05-22 17:49:45 -04:00
2c1ebb8b50 Re-apply #4446 + add packaging dependency
As discussed w/ @lysandrejik

packaging is maintained by PyPA (the Python Packaging Authority), and should be lightweight and stable
2020-05-22 17:29:03 -04:00
e6aeb0d3e8 Style 2020-05-22 17:20:03 -04:00
95a26fcf2d link to paper was broken (#4526)
changed from https://https://arxiv.org/abs/2001.04451.pdf to https://arxiv.org/abs/2001.04451.pdf
2020-05-22 15:17:09 -04:00
89d795f180 Added huseinzol05/t5-small-bahasa-cased README.md (#4522) 2020-05-22 15:04:06 -04:00
35df911485 Fix convert_token_type_ids_from_sequences for fast tokenizers (#4503) 2020-05-22 12:45:10 -04:00
f7677e1623 [model_cards] bart-large-cnn
cc @sshleifer
2020-05-22 12:20:54 -04:00
12e6afe900 Add Reformer colab to community noteboos 2020-05-22 17:03:34 +02:00
ef22ba4836 Re-pin versions 2020-05-22 11:03:07 -04:00
10d72390c0 Revert #4446 Since it introduces a new dependency 2020-05-22 10:49:45 -04:00
e0db6bbd65 Release: v2.10.0 2020-05-22 10:37:44 -04:00
bd6e301832 added functionality for electra classification head (#4257)
* added functionality for electra classification head

* unneeded dropout

* Test ELECTRA for sequence classification

* Style

Co-authored-by: Frankie <frankie@frase.io>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-22 09:48:21 -04:00
a086527727 Unused Union should not be imported 2020-05-21 09:42:47 -04:00
9d2ce253de TPU hangs when saving optimizer/scheduler (#4467)
* TPU hangs when saving optimizer/scheduler

* Style

* ParallelLoader is not a DataLoader

* Style

* Addressing @julien-c's comments
2020-05-21 09:18:27 -04:00
49296533ca Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
271bedb485 [examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
865d4d595e [ci] Close #4481 2020-05-20 18:27:42 -04:00
a3af8e86cb Update test_trainer_distributed.py 2020-05-20 18:26:51 -04:00
eacea530c1 🚨 Remove warning of deprecation (#4477)
Remove warning of deprecated overload of addcdiv_

Fix #4451
2020-05-20 16:48:29 -04:00
fa2fbed3e5 Better None gradients handling in TF Trainer (#4469)
* Better None gradients handling

* Apply Style

* Apply Style
2020-05-20 16:46:21 -04:00
e708bb75bf Correct TF formatting to exclude LayerNorms from weight decay (#4448)
* Exclude LayerNorms from weight decay

* Include both formats of layer norm
2020-05-20 16:45:59 -04:00
49c06132df pass on tokenizer to pipeline (#4489) 2020-05-20 22:23:21 +02:00
cacb654c7f Add Fine-tune DialoGPT on new datasets notebook (#4473) 2020-05-20 16:17:52 -04:00
30a09f3827 Adjust german bert model card, add new model card (#4488) 2020-05-20 16:08:29 -04:00
14cb5b35fa Fix slow gpu tests lysandre (#4487)
* There is one missing key in BERT

* Correct device for CamemBERT model

* RoBERTa tokenization adding prefix space

* Style
2020-05-20 11:59:45 -04:00
6dc52c78d8 Create README.md (#4482) 2020-05-20 09:45:50 -04:00
ed5456daf4 Model card for RuPERTa-base fine-tuned for NER (#4466) 2020-05-20 09:45:24 -04:00
c76450e20c Model card for Tereveni-AI/gpt2-124M-uk-fiction (#4470)
Create model card for "Tereveni-AI/gpt2-124M-uk-fiction" model
2020-05-20 09:44:26 -04:00
9907dc523a add BERT trained from review corpus. (#4405)
* add model_cards for BERT trained on reviews.

* add link to repository.

* refine README.md for each review model
2020-05-20 09:42:35 -04:00
efbc1c5a9d [MarianTokenizer] implement save_vocabulary and other common methods (#4389) 2020-05-19 19:45:49 -04:00
956c4c4eb4 [gpu slow tests] fix mbart-large-enro gpu tests (#4472) 2020-05-19 19:45:31 -04:00
48c3a70b4e [Longformer] Docs and clean API (#4464)
* add longformer docs

* improve docs
2020-05-19 21:52:36 +02:00
aa925a52fa [Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
5856999a9f add T5 fine-tuning notebook [Community notebooks] (#4462)
* add T5 fine-tuning notebook [Community notebooks]

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-19 18:26:28 +02:00
07dd7c2fd8 [cleanup] test_tokenization_common.py (#4390) 2020-05-19 10:46:55 -04:00
8f1d047148 Longformer (#4352)
* first commit

* bug fixes

* better examples

* undo padding

* remove wrong VOCAB_FILES_NAMES

* License

* make style

* make isort happy

* unit tests

* integration test

* make `black` happy by undoing `isort` changes!!

* lint

* no need for the padding value

* batch_size not bsz

* remove unused type casting

* seqlen not seq_len

* staticmethod

* `bert` selfattention instead of `n2`

* uint8 instead of bool + lints

* pad inputs_embeds using embeddings not a constant

* black

* unit test with padding

* fix unit tests

* remove redundant unit test

* upload model weights

* resolve todo

* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_

* increase unittest coverage
2020-05-19 16:04:43 +02:00
31eedff5a0 Refactored the README.md file (#4427) 2020-05-19 09:56:24 -04:00
384f0eb2f9 Map optimizer to correct device after loading from checkpoint. (#4403)
* Map optimizer to correct device after loading from checkpoint.

* Make style test pass

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 23:16:05 -04:00
bf14ef75f1 [Trainer] move model to device before setting optimizer (#4450) 2020-05-18 23:13:33 -04:00
5e7fe8b585 Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
4c06893610 Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300)
* Test case for #3936

* multigpu tests pass on pytorch 1.4.0

* Fixup

* multigpu tests pass on pytorch 1.5.0

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* rename multigpu to require_multigpu

* mode doc
2020-05-18 20:34:50 -04:00
9de4afa897 Make get_last_lr in trainer backward compatible (#4446)
* makes fetching last learning late in trainer backward compatible

* split comment to multiple lines

* fixes black styling issue

* uses version to create a more explicit logic
2020-05-18 20:17:36 -04:00
42e8fbfc51 Added model cards for Romanian BERT models (#4437)
* Create README.md

* Create README.md

* Update README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:48:56 -04:00
54065d68b8 added model card for german-sentiment-bert (#4435) 2020-05-18 18:44:41 -04:00
e28b7e2311 Create README.md (#4433) 2020-05-18 18:41:34 -04:00
09b933f19d Update README.md (model_card) (#4424)
- add a citation.
- modify the table of the BLUE benchmark.

The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
2020-05-18 18:18:17 -04:00
235777ccc9 Modify example of usage (#4413)
I followed the google example of usage for its electra small model but i have seen it is not meaningful, so i created a better example
2020-05-18 18:17:33 -04:00
9ddd3a6548 add model card for t5-base-squad (#4409)
* add model card for t5-base-squad

* Update model_cards/valhalla/t5-base-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:17:14 -04:00
c5aa114392 Added README huseinzol05/t5-base-bahasa-cased (#4377)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny

* added electra model

* added gpt2 117m bahasa readme

* added gpt2 345m bahasa readme

* added t5-base-bahasa

* fix readme

* Update model_cards/huseinzol05/t5-base-bahasa-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:10:23 -04:00
ca4a3f4da9 Adding optimizations block from ONNXRuntime. (#4431)
* Adding optimizations block from ONNXRuntime.

* Turn off external data format by default for PyTorch export.

* Correct the way use_external_format is passed through the cmdline args.
2020-05-18 20:32:33 +02:00
24538df919 [Community notebooks] General notebooks (#4441)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-05-18 20:23:57 +02:00
a699525d25 [test_pipelines] Mark tests > 10s @slow, small speedups (#4421) 2020-05-18 12:23:21 -04:00
d9ece8233d fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
d39bf0ac2d better naming in tf t5 (#4401) 2020-05-18 11:34:00 -04:00
590adb130b improve docstring (#4422) 2020-05-18 11:31:35 -04:00
026a5d0888 [T5 fp16] Fix fp16 in T5 (#4436)
* fix fp16 in t5

* make style

* refactor invert_attention_mask fn

* fix typo
2020-05-18 17:25:58 +02:00
fa6113f9a0 Fixed spelling of training (#4416) 2020-05-18 11:23:29 -04:00
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
a27c795908 fix (#4419) 2020-05-18 15:51:40 +02:00
31c799a0c9 Tag onnx export tests as slow (#4432) 2020-05-18 09:24:41 -04:00
8581a670e3 [MbartTokenizer] save to sentencepiece.bpe.model (#4335) 2020-05-18 08:54:04 -04:00
18d233d525 Allow the creation of "entity groups" for NerPipeline #3548 (#3957)
* Add index to be returned by NerPipeline to allow for the creation of

* Add entity groups

* Convert entity list to dict

* Add entity to entity_group_disagg atfter updating entity gorups

* Change 'group' parameter to 'grouped_entities'

* Add unit tests for grouped NER pipeline case

* Correct variable name typo for NER_FINETUNED_MODELS

* Sync grouped tests to recent test updates
2020-05-17 09:25:17 +02:00
3e0f062106 Fix addcmul_ 2020-05-15 17:44:17 -04:00
fc2a4c88ce Fix: one more try 2020-05-15 17:38:48 -04:00
55bda52555 Same fix for addcmul_ 2020-05-15 17:23:48 -04:00
ad02c961c6 Fix UserWarning: This overload of add_ is deprecated in pytorch==1.5.0 2020-05-15 17:09:11 -04:00
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
62427d0815 rerun notebook 02-transformers (#4341) 2020-05-15 10:33:08 -04:00
34706ba050 Allow for None gradients in GradientAccumulator. (#4372) 2020-05-15 09:52:00 -04:00
edf9ac11d4 Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00
b908f2e9dd Attempt to unpin torch version for Github Action. (#4384) 2020-05-15 15:47:15 +02:00
af2e6bf87c [examples] Streamline doc 2020-05-14 20:34:31 -04:00
7defc6670f p_mask in SQuAD pre-processing (#4049)
* Better p_mask building

* Adressing @mfuntowicz comments
2020-05-14 17:07:52 -04:00
84894974bd Updated ONNX notebook link in README. 2020-05-14 22:40:59 +02:00
db0076a9df Conversion script to export transformers models to ONNX IR. (#4253)
* Added generic ONNX conversion script for PyTorch model.

* WIP initial TF support.

* TensorFlow/Keras ONNX export working.

* Print framework version info

* Add possibility to check the model is correctly loading on ONNX runtime.

* Remove quantization option.

* Specify ONNX opset version when exporting.

* Formatting.

* Remove unused imports.

* Make functions more generally reusable from other part of the code.

* isort happy.

* flake happy

* Export only feature-extraction for now

* Correctly check inputs order / filter before export.

* Removed task variable

* Fix invalid args call in load_graph_from_args.

* Fix invalid args call in convert.

* Fix invalid args call in infer_shapes.

* Raise exception and catch in caller function instead of exit.

* Add 04-onnx-export.ipynb notebook

* More WIP on the notebook

* Remove unused imports

* Simplify & remove unused constants.

* Export with constant_folding in PyTorch

* Let's try to put function args in the right order this time ...

* Disable external_data_format temporary

* ONNX notebook draft ready.

* Updated notebooks charts + wording

* Correct error while exporting last chart in notebook.

* Adressing @LysandreJik comment.

* Set ONNX opset to 11 as default value.

* Set opset param mandatory

* Added ONNX export unittests

* Quality.

* flake8 happy

* Add keras2onnx dependency on extras["tf"]

* Pin keras2onnx on github master to v1.6.5

* Second attempt.

* Third attempt.

* Use the right repo URL this time ...

* Do the same for onnxconverter-common

* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2

* Correct commit hash.

* Addressing PR review: Optimization are enabled by default.

* Addressing PR review: small changes in the notebook

* setup.py comment about keras2onnx versioning.
2020-05-14 16:35:52 -04:00
2d05480174 Fix trainer evaluation (#4363)
* fix loss calculation in evaluation

* fix evaluation on TPU when prediction_loss_only is True
2020-05-14 14:39:44 -04:00
035678efdb Create README.md (#4359)
* Create README.md

* Update model_cards/savasy/bert-base-turkish-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-14 14:07:32 -04:00
b9c9e05381 Create README.md (#4357) 2020-05-14 14:06:10 -04:00
9535bf1977 Tokenizer.batch_decode convenience method (#4159) 2020-05-14 13:50:47 -04:00
7822cd38a0 [tests] make pipelines tests faster with smaller models (#4238)
covers torch and tf. Also fixes a failing @slow test
2020-05-14 13:36:02 -04:00
448c467256 Fix: unpin flake8 and fix cs errors (#4367)
* Fix: unpin flake8 and fix cs errors

* Ok we still need to quote those
2020-05-14 13:14:26 -04:00
c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
015f7812ed [ci skip] Pin isort 2020-05-14 10:12:18 -04:00
ef46ccb05c TPU needs a rendezvous (#4339) 2020-05-14 08:59:52 -04:00
94cb73c2d2 Add image and metadata (#4345)
Unfortunately i accidentally orphaned my other PR
2020-05-13 20:05:15 -04:00
a0eebdc404 Add link to W&B to see whole training logs (#4348) 2020-05-13 20:04:57 -04:00
7cb203fae4 Release: v2.9.1 2020-05-13 17:38:50 -04:00
9a687ebb77 [Marian Fixes] prevent predicting pad_token_id before softmax, support language codes, name multilingual models (#4290) 2020-05-13 17:29:41 -04:00
839bfaedb2 [Docs, Notebook] Include generation pipeline (#4295)
* add first text for generation

* add generation pipeline to usage

* Created using Colaboratory

* correct docstring

* finish
2020-05-13 14:24:08 -04:00
2d184cb553 wrong variable name used (#4328) 2020-05-13 10:22:03 -04:00
ca13618681 Question Answering for TF trainer (#4320)
* Add QA trainer example for TF

* Make data_dir optional

* Fix parameter logic

* Fix feature convert

* Update the READMEs to add the question-answering task

* Apply style

* Change 'sequence-classification' to 'text-classification' and prefix with 'eval' all the metric names

* Apply style

* Apply style
2020-05-13 09:22:31 -04:00
1e51bb717c Fix for #3865. PretrainedTokenizer mapped " do not" into " don't" when .decode(...) is called. Removed the " do not" --> " don't" mapping from clean_up_tokenization(...). (#4024) 2020-05-13 14:32:57 +02:00
241759101e (v2) Improvements to the wandb integration (#4324)
* Improvements to the wandb integration

* small reorg + no global necessary

* feat(trainer): log epoch and final metrics

* Simplify logging a bit

* Fixup

* Fix crash when just running eval

Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
2020-05-12 21:52:01 -04:00
7d7fe4997f Allow BatchEncoding to be initialized empty. (#4316)
* Allow BatchEncoding to be initialized empty.

This is required by recent changes introduced in TF 2.2.

* Attempt to unpin Tensorflow to 2.2 with the previous commit.
2020-05-12 15:02:46 -04:00
0a97f6312a Update README.md (#4313) 2020-05-12 15:01:45 -04:00
15a121fec5 Update README.md (#4315) 2020-05-12 15:01:34 -04:00
15d45211f7 [model_cards]: 🇹🇷 Add new ELECTRA small and base models for Turkish (#4318) 2020-05-12 15:01:17 -04:00
8a017cbb5a Add modelcard with acknowledgements (#4321) 2020-05-12 15:00:56 -04:00
4bf5042240 Fix BART tests on GPU (#4298) 2020-05-12 09:11:50 -04:00
e4512aab3b Add MultipleChoice to TFTrainer [WIP] (#4270)
* catch gpu len 1 set to gpu0

* Add mpc to trainer

* Add MPC for TF

* fix TF automodel for MPC and add Albert

* Apply style

* Fix import

* Note to self: double check

* Make shape None, None for datasetgenerator output shapes

* Add from_pt bool which doesnt seem to work

* Original checkpoint dir

* Fix docstrings for automodel

* Update readme and apply style

* Colab should probably not be from users

* Colabs should probably not be from users

* Add colab

* Update README.md

* Update README.md

* Cleanup __intit__

* Cleanup flake8 trailing comma

* Update src/transformers/training_args_tf.py

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Viktor Alm <viktoralm@pop-os.localdomain>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-12 08:48:48 -04:00
65be574aec fixed missing torch module import (#4305)
fixed missing torch module import in example usage code
2020-05-12 08:34:17 -04:00
31e67dd19f Remove hard-coded pad token id in distilbert and albert (#3965) 2020-05-12 08:32:44 -04:00
30e343862f pin TF to 2.1 (#4297)
* pin TF to 2.1

* Pin flake8 as well
2020-05-11 21:03:30 -04:00
56e8ef632f [ci] Restrict GPU tests to actual code commits 2020-05-11 20:40:41 -04:00
ba6f6e44a8 [ci] Re-enable torch GPU tests 2020-05-12 00:05:36 +00:00
9524956819 Documentation specification (#4294) 2020-05-11 16:43:57 -04:00
61d22f9cc7 Simplify cache vars and allow for TRANSFORMERS_CACHE env (#4226)
* simplify cache vars and allow for TRANSFORMERS_CACHE env

As it currently stands, "TRANSFORMERS_CACHE" is not an accepted variable. It seems that the these variables were not updated when moving from version pytorch_transformers to transformers. In addition, the fallback procedure could be improved. and simplified. Pathlib seems redundant here.

* Update file_utils.py
2020-05-11 15:24:02 -04:00
cd40cb8879 Fix special token doc (#4292) 2020-05-11 15:05:36 -04:00
82601f4c1a Allow gpt2 to be exported to valid ONNX (#4244)
* allow gpt2 to be exported to valid ONNX model

* cast size from int to float explictly
2020-05-11 14:55:55 -04:00
39994051e4 Add migrating from pytorch-transformers (#4273)
"Migrating from pytorch-transformers to transformers" is missing in the main document. It is available in the main `readme` thought. Just move it to the document.
2020-05-11 13:35:13 -04:00
051dcb2a07 CamemBERT does not make use of Token Type IDs (#4289) 2020-05-11 13:31:03 -04:00
41e8291217 Add ALBERT to the Tensorflow to Pytorch model conversion cli (#3933)
* Add ALBERT to convert command of transformers-cli

* Document ALBERT tf to pytorch model conversion
2020-05-11 13:10:00 -04:00
3f42eb979f Documentation: fix links to NER examples (#4279)
* docs: fix link to token classification (NER) example

* examples: fix links to NER scripts
2020-05-11 12:48:21 -04:00
8fdb7997c6 Align sentiment-analysis' tokenizer (currently uncased) to the model (uncased). (#4264) 2020-05-11 12:45:53 -04:00
4658896ee1 [Marian] Fix typo in docstring (#4284) 2020-05-11 11:47:51 -04:00
bf64b8cf09 Model card for bert-turkish-question-answering question-answering model (#4281)
* Create README.md

* Update model_cards/lserinol/bert-turkish-question-answering/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-11 11:32:25 -04:00
94b57bf796 [TF 2.2 compat] use tf.VariableAggregation.ONLY_FIRST_REPLICA (#4283)
* Fix the issue to properly run the accumulator with TF 2.2

* Apply style

* Fix training_args_tf for TF 2.2

* Fix the TF training args when only one GPU is available

* Remove the fixed version of TF in setup.py
2020-05-11 11:28:37 -04:00
cffbb3d8ed Update README.md (#4276) 2020-05-11 11:24:41 -04:00
5f50d619dd Fix XTREME link + add number of eval documents + fix usage code (#4280) 2020-05-11 11:24:10 -04:00
7751be7cee fix reformer apex scaling issue (#4242) 2020-05-11 16:53:42 +02:00
ac7d5f67a2 [Reformer] Add Enwiki8 Reformer Model - Adapt convert script (#4282)
* adapt convert script

* update convert script

* finish

* fix marian pretrained docs
2020-05-11 16:38:07 +02:00
336116d960 Reformer enwik8 - Model card (#4286) 2020-05-11 16:22:08 +02:00
b290c32e16 [docs] fix typo (#4249) 2020-05-10 14:07:08 -04:00
3487be75ef [Marian] documentation and AutoModel support (#4152)
- MarianSentencepieceTokenizer - > MarianTokenizer
- Start using unk token.
- add docs page
- add better generation params to MarianConfig
- more conversion utilities
2020-05-10 13:54:57 -04:00
9d2f467bfb [README] Corrected some grammatical mistakes (#4199) 2020-05-10 09:02:36 -04:00
7b75aa9fa5 [TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223)
* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once

* Update examples/README.md

* [xla_spawn] Add `_mp_fn` to other Trainer scripts

* [TPU] Fix: eval dataloader was None
2020-05-08 14:10:05 -04:00
274d850d34 Fix #4098 2020-05-08 12:39:46 -04:00
26dad0a9fa example updated to use generation pipeline (#4230)
* example updated to use generation pipeline

* Update model_cards/LorenzoDeMattei/GePpeTto/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-08 09:45:10 -04:00
9ebb5b2a54 Model card for allegro/herbert-klej-cased-tokenizer-v1 (#4184) 2020-05-08 09:42:43 -04:00
9e54efd004 Model card for allegro/herbert-klej-cased-v1 (#4183) 2020-05-08 09:42:28 -04:00
a8b798e6c4 Model card for spanish electra small (#4196) 2020-05-08 09:30:15 -04:00
242005d762 Create README.md (#4132)
* Create README.md

* Adding code fence around code block
2020-05-08 09:27:29 -04:00
5940c73bbb Create README.md (#4179)
model card for my De Novo Drug discovery model using MLM
2020-05-08 09:25:36 -04:00
cf08830c28 [Pipeline, Generation] tf generation pipeline bug (#4217)
* fix PR

* move tests to correct place
2020-05-08 08:30:05 -04:00
8bf7312654 Add AlbertForPreTraining and TFAlbertForPreTraining models. (#4057)
* Add AlbertForPreTraining and TFAlbertForPreTraining models.

* PyTorch conversion

* TensorFlow conversion

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-07 19:44:51 -04:00
c99fe0386b [doc] Fix broken links + remove crazy big notebook 2020-05-07 18:44:18 -04:00
66113bd626 Create README.md (#4202) 2020-05-07 18:31:22 -04:00
6669915b65 [examples] Add column for pytorch-lightning support 2020-05-07 15:26:58 -04:00
612fa1b10b Examples readme.md (#4215)
* README

* Update README.md
2020-05-07 15:00:06 -04:00
2e57824374 Pin isort and tf <= 2.1.0 2020-05-07 14:42:00 -04:00
e7cfc1a313 Release: v2.9.0 2020-05-07 14:15:20 -04:00
0ae96ff8a7 BIG Reorganize examples (#4213)
* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
2020-05-07 13:48:44 -04:00
cafa6a9e29 [Trainer] Ability to specify optimizer/scheduler at init
cc @patrickvonplaten @thomwolf
2020-05-07 11:25:26 -04:00
e4fd5e3999 Use with_extension to change the extension (#4203)
As per https://github.com/huggingface/transformers/pull/3934#discussion_r421307659
2020-05-07 11:14:56 -04:00
ebf80e2e70 Tpu trainer (#4146)
* wip

* wip

* a last wip

* Better logging when using TPUs

* Correct argument name

* Tests

* fix

* Metrics in evaluation

* Update src/transformers/training_args.py

* [tpu] Use launcher script instead

* [tpu] lots of tweaks

* Fix formatting

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-07 10:34:04 -04:00
026097b9ee Ensure fast tokenizer can construct tensor without pad token if only one sample is provided. (#4201) 2020-05-07 10:02:53 -04:00
0a6cbea0a5 Rewritten batch support in pipelines. (#4154)
* Rewritten batch support in pipelines.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix imports sorting 🔧

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Set pad_to_max_length=True by default on Pipeline.

* Set pad_to_max_length=False for generation pipelines.

Most of generation models doesn't have padding token.

* Address @joeddav review comment: Uniformized *args.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @joeddav review comment: Uniformized *args (second).

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-05-07 09:52:40 -04:00
99d1a69444 fix examples (#4192) 2020-05-07 10:54:48 +02:00
74ffc9ea6b [Reformer] Fix example and error message (#4191)
* fix example reformer

* fix error message and example docstring

* improved error message
2020-05-07 10:50:11 +02:00
96c78396ce fix docstring reformer (#4190) 2020-05-07 10:28:31 +02:00
dca34695d0 Reformer (#3351)
* first copy & past commit from Bert and morgans LSH code

* add easy way to compare to trax original code

* translate most of function

* make trax lsh self attention deterministic with numpy seed + copy paste code

* add same config

* add same config

* make layer init work

* implemented hash_vectors function for lsh attention

* continue reformer translation

* hf LSHSelfAttentionLayer gives same output as trax layer

* refactor code

* refactor code

* refactor code

* refactor

* refactor + add reformer config

* delete bogus file

* split reformer attention layer into two layers

* save intermediate step

* save intermediate step

* make test work

* add complete reformer block layer

* finish reformer layer

* implement causal and self mask

* clean reformer test and refactor code

* fix merge conflicts

* fix merge conflicts

* update init

* fix device for GPU

* fix chunk length init for tests

* include morgans optimization

* improve memory a bit

* improve comment

* factorize num_buckets

* better testing parameters

* make whole model work

* make lm model work

* add t5 copy paste tokenizer

* add chunking feed forward

* clean config

* add improved assert statements

* make tokenizer work

* improve test

* correct typo

* extend config

* add complexer test

* add new axial position embeddings

* add local block attention layer

* clean tests

* refactor

* better testing

* save intermediate progress

* clean test file

* make shorter input length work for model

* allow variable input length

* refactor

* make forward pass for pretrained model work

* add generation possibility

* finish dropout and init

* make style

* refactor

* add first version of RevNet Layers

* make forward pass work and add convert file

* make uploaded model forward pass work

* make uploaded model forward pass work

* refactor code

* add namedtuples and cache buckets

* correct head masks

* refactor

* made reformer more flexible

* make style

* remove set max length

* add attention masks

* fix up tests

* fix lsh attention mask

* make random seed optional for the moment

* improve memory in reformer

* add tests

* make style

* make sure masks work correctly

* detach gradients

* save intermediate

* correct backprob through gather

* make style

* change back num hashes

* rename to labels

* fix rotation shape

* fix detach

* update

* fix trainer

* fix backward dropout

* make reformer more flexible

* fix conflict

* fix

* fix

* add tests for fixed seed in reformer layer

* fix trainer typo

* fix typo in activations

* add fp16 tests

* add fp16 training

* support fp16

* correct gradient bug in reformer

* add fast gelu

* re-add dropout for embedding dropout

* better naming

* better naming

* renaming

* finalize test branch

* finalize tests

* add more tests

* finish tests

* fix

* fix type trainer

* fix fp16 tests

* fix tests

* fix tests

* fix tests

* fix issue with dropout

* fix dropout seeds

* correct random seed on gpu

* finalize random seed for dropout

* finalize random seed for dropout

* remove duplicate line

* correct half precision bug

* make style

* refactor

* refactor

* docstring

* remove sinusoidal position encodings for reformer

* move chunking to modeling_utils

* make style

* clean config

* make style

* fix tests

* fix auto tests

* pretrained models

* fix docstring

* update conversion file

* Update pretrained_models.rst

* fix rst

* fix rst

* update copyright

* fix test path

* fix test path

* fix small issue in test

* include reformer in generation tests

* add docs for axial position encoding

* finish docs

* Update convert_reformer_trax_checkpoint_to_pytorch.py

* remove isort

* include sams comments

* remove wrong comment in utils

* correct typos

* fix typo

* Update reformer.rst

* applied morgans optimization

* make style

* make gpu compatible

* remove bogus file

* big test refactor

* add example for chunking

* fix typo

* add to README
2020-05-07 10:17:01 +02:00
877fc56410 change order pytorch/tf in readme (#4167) 2020-05-06 16:31:07 -04:00
aad50151f3 TF version of the trainer (#4017)
* First commit to add a TF version of the trainer.

* Make the TF trainer closer to what looks the PT trainer

* Refactoring common code between the PT and TF trainer into an util file.

* Some bugfix + better similarity with the PT trainer

* Add missing class in transformers init

* Bugfix over prediction + use classification report instead of simple metrics

* Fix name error

* Fix optimization tests + style

* Apply style

* Several bugfix for multi-gpu training

* Apply style

* Apply style

* Add glue example for the TF trainer

* Several bugix + address the reviews

* Fix on the TF training args file

* Add a debug mode

* Bugfix in utils_ner.py when segment_ids is None

* Apply style

* Apply style

* Add TPU strategy

* Fix selection strategy
2020-05-06 12:56:52 -04:00
25296b12aa Fix overwrite_cache behaviour for pytorch lightning examples (#4093) 2020-05-06 12:24:49 -04:00
9972562d33 Include ElectraPreTrainedModel into __init__ (#4173) 2020-05-06 12:00:23 -04:00
ff8ed52dd8 Camembert-large-fquad model card (#4143)
Description for the model card describing the camembert-large-fquad model.
2020-05-06 10:41:07 -04:00
4c3be2e718 Add model card for the NER model (#4162) 2020-05-06 10:40:55 -04:00
17ae0363db Fix markdown to show the results table properly (#4119) 2020-05-06 10:38:29 -04:00
a638e986f4 fix hard wired pad token id (#4138) 2020-05-06 00:42:34 +02:00
fd2174664c [Trainer] W&B: Enable model watch
See https://github.com/huggingface/transformers/pull/3916
2020-05-05 10:59:23 -04:00
79b1c6966b Pytorch 1.5.0 (#3973)
* Standard deviation can no longer be set to 0

* Remove torch pinned version

* 9th instead of 10th, silly me
2020-05-05 10:23:01 -04:00
818463ee8e Trainer: add logging through Weights & Biases (#3916)
* feat: add logging through Weights & Biases

* feat(wandb): make logging compatible with all scripts

* style(trainer.py): fix formatting

* [Trainer] Tweak wandb integration

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-04 22:42:27 -04:00
858b1d1e5a allow an already created tensorboard SummaryWriter be passed to Trainer 2020-05-04 19:58:24 -04:00
8e67573a64 [EncoderDecoder Tests] Improve tests (#4046)
* Hoist bert model tester for patric

* indent

* make tests work

* Update tests/test_modeling_bert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: sshleifer <sshleifer@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-04 02:18:36 +02:00
6af3306a1d Add decoder specific error message for T5Stack.forward (#4128) 2020-05-03 12:40:08 +02:00
1cdd2ad2af Fix #2941 (#4109)
* Fix of issue #2941

Reshaped score array to avoid `numpy` ValueError.

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:20:30 -04:00
5f4f6b65b3 distilroberta-base-finetuned-sentiment (#4115)
* Create model card

Create Model card for distilroberta-base-finetuned-sentiment

* Update model_cards/mrm8488/distilroberta-base-finetuned-sentiment/README.md

* Update model_cards/mrm8488/distilroberta-base-finetuned-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:19:31 -04:00
7da051f135 model card for surajp/albert-base-sanskrit (#4114)
* Create README.md

* Update model_cards/surajp/albert-base-sanskrit/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:15:39 -04:00
14911e2e12 Create README.md (#4112) 2020-05-02 10:52:12 -04:00
9e97c87539 Added huseinzol05/gpt2-345M-bahasa-cased (#4102) 2020-05-02 10:51:15 -04:00
4c5bd92183 Update run_pl_glue.py (#4117) 2020-05-02 10:38:30 -04:00
5282b31df4 Update run_pl_ner.py (#4118) 2020-05-02 10:38:21 -04:00
1e616c0af3 NER: parse args from .args file or JSON (#4110)
* ner: parse args from .args file or JSON

* examples: mention json-based configuration file support for run_ner script
2020-05-02 10:29:17 -04:00
abb1fa3f37 Update README.md 2020-05-02 10:32:00 +02:00
0ccbfd2868 Update Reformer ReadME 2020-05-02 10:31:00 +02:00
2d8340a91f [Reformer] Move model card to google model (#4113)
* correct model card

* remove model card from patrick von platen
2020-05-02 10:25:22 +02:00
d713cfc5eb GePpeTto 🇮🇹: Fixpath to model card 2020-05-01 11:48:58 -04:00
f3d44301cc GePpeTto model 🇮🇹 (#4099)
* Create GePpeTto.md

* Update model_cards/LorenzoDeMattei/GePpeTto.md

* Update model_cards/LorenzoDeMattei/GePpeTto.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-01 11:46:42 -04:00
27d55125e6 Configs: saner num_labels in configs. (#3967) 2020-05-01 11:28:55 -04:00
e80be7f1d0 docs: add xlm-roberta section to multi-lingual section (#4101) 2020-05-01 11:06:58 -04:00
18db92dd9a [testing] add timeout_decorator (#3543) 2020-05-01 09:05:47 -04:00
b8686174be Merge pull request #3934 from huggingface/examples_args_from_files
[qol] example scripts: parse args from .args file or JSON
2020-04-30 22:40:13 -04:00
f39217a5ec [tests] Light cleanup of tempfile in tests/ 2020-04-30 22:30:15 -04:00
f54dc3f4d5 [ci] Load pretrained models into the default (long-lived) cache
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models

When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.

I'd rather always use the default cache
2020-04-30 22:30:15 -04:00
6b410bedfc Model Card: gaochangkuan README.md (#4033)
* Create README.md

* Update README.md

* tweak

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-30 22:26:58 -04:00
8829ace4aa added gpt2 117m bahasa readme
(cherry picked from commit a4a673a1d0bec0bf4085eef021acb788ca1f5eb5)
2020-04-30 22:20:00 -04:00
1851a64b6f create model_card camembert-base-wikipedia-4gb 2020-04-30 22:16:12 -04:00
443e5e34af Create README.md 2020-04-30 22:16:00 -04:00
60e1556a44 Create model_card camembert-base-ccnet-4gb 2020-04-30 22:15:47 -04:00
fa9365eca5 Create README.md 2020-04-30 22:15:38 -04:00
afe002b04c Create README.md 2020-04-30 22:15:23 -04:00
8b5e5ebcf9 Continue training args and tqdm in notebooks (#3939)
* Continue training args

* Continue training args

* added explaination

* added explaination

* added explaination

* Fixed tqdm auto

* Update src/transformers/training_args.py

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* Update src/transformers/training_args.py

* Update src/transformers/training_args.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-30 22:14:08 -04:00
ab90353f1a [cli] {login, upload, s3} display more helpful error messages 2020-04-30 12:51:06 -04:00
452dd0e4d9 [ci] Align test_hf_api.py with API change 2020-04-30 12:06:01 -04:00
7f9193ef09 Fixed Style Inconsistency (#3976) 2020-04-30 14:33:09 +02:00
64070cbb88 Fix TF input docstrings to refer to tf.Tensor rather than torch.FloatTensor. (#4051) 2020-04-30 14:28:56 +02:00
e73595bd64 Remove jitted method so that our models are pickable. (#4050) 2020-04-29 09:53:19 -04:00
2c77842887 [Fix common tests on GPU] send model, ids to torch_device (#4014) 2020-04-29 09:47:20 -04:00
6faca88ee0 Align MarianMT with #4030
cc @sshleifer
2020-04-28 20:35:20 -04:00
211e130811 [github] Issue templates: populate some labels
cc @bramvanroy @stefan-it
2020-04-28 20:34:34 -04:00
455c639093 CDN urls (#4030)
* [file_utils] use_cdn + documentation

* Move to cdn. urls for weights

* [urls] Hotfix for bert-base-japanese
2020-04-28 20:27:14 -04:00
8ba4c5885f Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (#3994)
* Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair and

* The style and quality are now top-notch
2020-04-29 01:13:59 +02:00
847e7f3379 MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') (#3908)
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
2020-04-28 18:22:37 -04:00
d714dfeaa8 [isort] add known 3rd party to setup.cfg (#4053)
* add known 3rd party to setup.cfg

* comment

* Update CONTRIBUTING.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-28 17:12:00 -04:00
d52b0e294a Minor Readme Fixes (#4056)
Added contact info and fixed typos.
2020-04-28 16:42:15 -04:00
55adefe428 Add license information to model cards (#3864)
Close #3357
2020-04-28 16:40:21 -04:00
0ac6d0bf33 Create README.md
I create japanese binary classification.
2020-04-28 15:35:30 -04:00
c73c83b0e6 Small cosmetic changes to CamemBERT model card 2020-04-28 15:32:55 -04:00
4a94c062a4 Provide model card for roberta-base-squad2-covid 2020-04-28 15:29:30 -04:00
c7d06b79ae Fix #3954 - GPT2 is not traceable (#3955)
* Update sqrt computation so it can survive a torch.jit.trace

* Update modeling_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-04-28 21:18:56 +02:00
9a0a8c1c6f add examples to doc (#4045) 2020-04-28 16:33:23 +02:00
fa49b9afea Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (#3383)
* change encoder decoder style to bart & t5 style

* make encoder decoder generation dummy work for bert

* make style

* clean init config in encoder decoder

* add tests for encoder decoder models

* refactor and add last tests

* refactor and add last tests

* fix attn masks for bert encoder decoder

* make style

* refactor prepare inputs for Bert

* refactor

* finish encoder decoder

* correct typo

* add docstring to config

* finish

* add tests

* better naming

* make style

* fix flake8

* clean docstring

* make style

* rename
2020-04-28 15:11:09 +02:00
180585741c [Generation] Generation should allow to start with empty prompt (#3993)
* fix empty prompt

* fix length in generation pipeline
2020-04-28 14:33:15 +02:00
52679fbc2e add dialogpt training tips (#3996) 2020-04-28 14:32:31 +02:00
b5c6d3d4c7 notebooks: minor fix for community provided models example (#4025) 2020-04-28 09:12:25 +02:00
2fade302ac camembert-base-fquad
Model card for illuin release of camembert-base-fquad
2020-04-27 18:29:55 -04:00
20c3b8cab4 Create model card 2020-04-27 18:27:46 -04:00
b3f272ffcb Create model card 2020-04-27 18:27:04 -04:00
518f291eef add model card for Hindi-BERT 2020-04-27 18:25:16 -04:00
d7b3bf547c Model cards for KoELECTRA 2020-04-27 18:21:01 -04:00
db9d56c08a Add modelcard for Hate-speech-CNERG/dehatebert-mono-arabic model (#3979)
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card
2020-04-27 18:18:54 -04:00
41750a6cff Fix typos 2020-04-27 13:25:53 -04:00
12bb7fe770 Fix t5 doc typos (#3978)
* Fix tpo in into and add line under

* Add missing blank line under

* Correct types under
2020-04-27 18:27:15 +02:00
97a375484c rm boto3 dependency 2020-04-27 11:17:14 -04:00
4e817ff418 Create README.md (#3966) 2020-04-25 09:16:40 -04:00
73d6a2f901 [model_cards] xlnet_chinese_large & roberta_chinese_large 2020-04-24 16:12:42 -04:00
623ba0236d Create README.md (#3882) 2020-04-24 15:57:01 -04:00
f4078e0db6 Feat/add model card (#3923)
* add model card for gpt2-imdb-ctrl

* fix title

* add sentiment control description
2020-04-24 10:24:28 -04:00
03322b4261 Create README.md (#3917) 2020-04-24 10:24:00 -04:00
c811526004 [examples] For convenience, also save the tokenizer
Close #3921
2020-04-24 09:52:42 -04:00
b0167632ce Shuffle train subset for summarization example (#3909)
* Shuffle train subset

* Cleaner shuffle
2020-04-24 07:55:34 -04:00
c53cc018de [Trainer] Fix _rotate_checkpoints
Close #3920
2020-04-23 23:59:43 +00:00
cbbb3c43c5 [hubconf] Modify pythonpath to get canonical imports to work
See https://github.com/huggingface/transformers/pull/3881/files#r412292660

Should we remove SRC_DIR from sys.path right after the imports, @aaugustin?
2020-04-23 16:27:43 -04:00
77b75d2c78 Fix for #3873 to change type of exponent parameter for torch.pow() call from int to float (#3924) 2020-04-23 14:25:31 -04:00
6ba254ee54 quick fix wording readme for community models (#3900) 2020-04-23 14:19:45 -04:00
a79a9e1241 Fix TFAlbertForSequenceClassification classifier dropout probability. It was set to config.hidden_dropout_prob, but should be config.classifier_dropout_prob. (#3928) 2020-04-23 13:18:16 -04:00
8e093e5981 Remove 50k limits bug 2020-04-23 11:15:09 -04:00
6af5a54c28 [Trainer] reuse constant 2020-04-23 11:02:05 -04:00
7c2a32ff88 [housekeeping] super() 2020-04-23 10:43:22 -04:00
a946b6b51b [housekeeping] Upgrade # type Python 2 syntax
cc @sshleifer
2020-04-23 10:39:24 -04:00
cb3c2212c7 Create model card (#3890)
Model: TinyBERT-spanish-uncased-finetuned-ner
2020-04-22 14:56:43 -04:00
d698b87f20 Update comparison table (#3889) 2020-04-22 14:54:17 -04:00
13dd2acca4 Bump tokenizers version to final 0.7.0 (#3898) 2020-04-22 11:02:29 -04:00
f16540fcba Pipeline for Text Generation: GenerationPipeline (#3758)
* Add GenerationPipeline

* Fix parameter names

* Correct parameter __call__ parameters

* Add model type attribute and correct function calls for prepare_input

* Take out trailing commas from init attributes

* Remove unnecessary tokenization line

* Implement support for multiple text inputs

* Apply generation support for multiple input text prompts

* Take out tensor coersion

* Take out batch index

* Add text prompt to return sequence

* Squeeze token tensore before decoding

* Return only a single list of sequences if only one prompt was used

* Correct results variable name

* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2

* Registedred AutoModelWithLMHead for both pt and t

* Update docstring for GenerationPipeline

* Add kwargs parameter to mode.generate

* Take out kwargs parameter after all

* Add generation pipeline example in pipeline docstring

* Fix max length by squeezing tokens tensor

* Apply ensure_tensor_on_device to pytorch tensor

* Include generation step in torch.no_grad

* Take out input from prepare_xlm_input and set 'en' as default xlm_language

* Apply framework specific encoding during prepare_input

* Format w make style

* Move GenerationPipeline import to follow proper import sorting

* Take out training comma from generation dict

* Apply requested changes

* Change name to TextGenerationPipeline

* Apply TextGenerationPipeline rename to __init___

* Changing alias to

* Set input mapping as input to ensure_tensor_on_device

* Fix assertion placement

* Add test_text_generation

* Add TextGenerationPipeline to PipelineCommonTests

* Take out whitespace

* Format __init__ w black

* Fix __init__ style

* Forman __init___

* Add line to end of __init__

* Correct model tokenizer set for test_text_generation

* Ensure to return list of list, not list of string (to pass test)

* Limit test models to only 3 to limit runtime to address circleCI timeout error

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict

* Fix blank result list

* Add TextGenerationPipeline to pipelines.rst

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH

* Fix incorrectly moved result list

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Add back generation line and make style

* Take out blank whitespace

* Apply new alis, text-generation, to test_pipelines

* Fix text generation alias in test

* Update src/transformers/pipelines.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-22 09:37:03 -04:00
1dc9b3c784 Fixes #3877 2020-04-22 01:15:10 +00:00
dd9d483d03 Trainer (#3800)
* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6ef071029cfe82838a18dab046b5813976
2020-04-21 20:11:56 -04:00
eb5601b0a5 [ci] Pin torch version while we update 2020-04-21 15:46:18 -04:00
53f5ef6df5 create readme for spentaur/yelp model (#3874)
* create readme for spentaur/yelp model

* update spentaur/yelp/README.md

* remove typo
2020-04-21 15:31:36 -04:00
d32585a304 Fix Torch.hub + Integration test 2020-04-21 14:13:30 -04:00
7d40901ce3 Fix Documentation issue in BertForMaskedLM forward (#3855) 2020-04-21 09:08:20 +02:00
b1ff0b2ae7 Fix bug in examples: double wrap into DataParallel during eval 2020-04-20 19:37:44 -04:00
7f23af1684 added electra model
(cherry picked from commit b5f2dc5d627d44b8cbb0ccf8ad2b46bea211a236)
2020-04-20 17:17:58 -04:00
03121deba3 New model added
The first model added to the repo
2020-04-20 17:10:01 -04:00
15b9868f8b Create model card 2020-04-20 17:07:34 -04:00
2c05b8a56c Remove tqdm logging when using pipelines. (#3833)
Introduce tqdm_enabled parameter on squad_convert_examples_to_features() default to True and set to False in QA pipelines.
2020-04-20 22:58:52 +02:00
c79b550dd0 Add qas_id to SquadResult and SquadExample (#3745)
* Add qas_id

* Fix incorrect name in squad.py

* Make output files optional for squad eval
2020-04-20 16:08:57 -04:00
c4158a6314 [Pipelines] Encode to max length of input not max length of tokenizer for batch input (#3857)
* remove max_length = tokenizer.max_length when encoding

* make style
2020-04-20 14:39:16 -04:00
857ccdb259 exbert links for my albert model cards (#3729)
* exbert links for my albert model cards

* Added exbert tag to the metadata block

* Adding "how to cite"
2020-04-20 10:54:39 -04:00
a504cb49ec [examples] fix summarization do_predict (#3866) 2020-04-20 10:49:56 -04:00
52c85f847a Update README.md 2020-04-20 10:10:56 -04:00
a21d4fa410 add "by" to ReadMe 2020-04-18 18:07:17 +02:00
827d6d6ef0 Cleanup fast tokenizers integration (#3706)
* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>
2020-04-18 13:43:57 +02:00
60a42ef1c0 [model_cards] Fix CamemBERT table markdown
see https://github.com/huggingface/transformers/pull/3836
2020-04-17 20:21:15 -04:00
88aecee6a2 [ci] GitHub-hosted runner has no space left on device 2020-04-17 20:16:00 -04:00
73efa694e6 Update camembert-base-README.md (#3836) 2020-04-17 20:08:13 -04:00
e9d0bc027a [Config, Serialization] more readable config serialization (#3797)
* better config serialization

* finish configuration utils
2020-04-17 20:07:18 -04:00
8b63a01d95 XLM tokenizer should encode with bos token (#3791)
* XLM tokenizer should encode with bos token

* Update tests
2020-04-17 11:28:55 -04:00
1d4a35b396 Higher tolerance for past testing in TF T5 (#3844) 2020-04-17 11:26:16 -04:00
d13eca11e2 Higher tolerance for past testing in T5 (#3843) 2020-04-17 11:25:14 -04:00
b0c9fbb293 Add workflow to build docs (#3763) 2020-04-17 11:23:18 -04:00
c19727fd38 Add support for the null answer in QuestionAnsweringPipeline (#3441)
* Add support for the null answer in `QuestionAnsweringPipeline`

* black

* Fix min null score computation

* Fix a PR comment
2020-04-17 11:17:21 -04:00
edf0582c0b Fix token_type_id in BERT question-answering example (#3790)
token_type_id is converted into the segment embedding. For question answering,
this needs to highlight whether a token belongs to sequence 0 or 1.
encode_plus takes care of correctly setting this parameter automatically.
2020-04-17 11:14:12 -04:00
6d00033e97 Question Answering support for Albert and Roberta in TF (#3812)
* Add TFAlbertForQuestionAnswering

* Add TFRobertaForQuestionAnswering

* Update TFAutoModel with Roberta/Albert for QA

* Clean `super` TF Albert calls
2020-04-17 10:45:30 -04:00
f399c00610 Update README 2020-04-17 09:42:22 +02:00
f0c96fafd1 [examples] summarization/bart/finetune.py supports t5 (#3824)
renames `run_bart_sum.py` to `finetune.py`
2020-04-16 15:15:19 -04:00
0cec4fab7d typo: fine-grained token-leven
Changing from "fine-grained token-leven" to "fine-grained token-level"
2020-04-16 15:11:23 -04:00
14cdeee75a Tanh torch warnings 2020-04-16 15:10:35 -04:00
16469fedbd [PretrainedTokenizer] Factor out tensor conversion method (#3777) 2020-04-16 15:02:43 -04:00
80a1694514 [Examples, T5] Change newstest2013 to newstest2014 and clean up (#3817)
* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion

* More pythonic file handling through 'with' context

* COSMETIC - ran Black and isort

* Fixed reference to number of lines in newstest2014

* Fixed failing test. More pythonic file handling

* finish PR from tholiao

* remove outcommented lines

* make style

* make isort happy

Co-authored-by: Thomas Liao <tholiao@gmail.com>
2020-04-16 20:00:41 +02:00
d486795158 JIT not compatible with PyTorch/XLA (#3743) 2020-04-16 11:19:24 -04:00
b1e2368b32 Typo fix (#3821) 2020-04-16 11:04:32 -04:00
baca8fa8e6 clean pipelines (#3795) 2020-04-16 10:21:34 -04:00
38f7461df3 [TFT5, Cache] Add cache to TFT5 (#3772)
* correct gpt2 test inputs

* make style

* delete modeling_gpt2 change in test file

* translate from pytorch

* correct tests

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* make tensorflow t5 caching work

* make style

* clean reorder cache

* remove unnecessary spaces

* fix test
2020-04-16 16:14:52 +02:00
a5b249472e change pad token id to config pad token id (#3793) 2020-04-16 15:58:57 +02:00
dbd041243d [cleanup] factor out get_head_mask, invert_attn_mask, get_exten… (#3806)
* Delete some copy pasted code
2020-04-16 09:55:25 -04:00
d22894dfd4 [Docs] Add DialoGPT (#3755)
* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-16 09:04:32 +02:00
c59b1e682d [examples] unit test for run_bart_sum (#3544)
- adds pytorch-lightning dependency
2020-04-15 18:35:01 -04:00
301bf8d1b4 Create Modelcard for Reformer Model 2020-04-15 16:26:24 +02:00
01c37dcdb5 [Config, Caching] Remove output_past everywhere and replace by use_cache argument (#3734)
* remove output_past from pt

* make style

* add optional input length for gpt2

* add use cache to prepare input

* save memory in gpt2

* correct gpt2 test inputs

* make past input optional for gpt2

* finish use_cache for all models

* make style

* delete modeling_gpt2 change in test file

* correct docstring

* correct is true statements for gpt2
2020-04-14 14:40:28 -04:00
092cf881a5 [Generation, EncoderDecoder] Apply Encoder Decoder 1.5GB memory… (#3778) 2020-04-13 22:29:28 -04:00
352d5472b0 Shift labels internally within TransfoXLLMHeadModel when called with labels (#3716)
* Shifting labels inside TransfoXLLMHead

* Changed doc to reflect change

* Updated pytorch test

* removed IDE whitespace changes

* black reformat

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-04-13 18:11:23 +02:00
5ebd898953 fix dataset shuffling for Distributed training (#huggingface#3721) (#3766) 2020-04-13 10:11:18 -04:00
7972a4019f updated dutch squad model card (#3736)
* added model_cards for polish squad models

* corrected mistake in polish design cards

* updated model_cards for squad2_dutch model

* added links to benchmark models

Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
2020-04-11 06:44:59 -04:00
f8c1071c51 Added README huseinzol05/albert-tiny-bahasa-cased (#3746)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny
2020-04-11 06:42:06 -04:00
700ccf6e35 Fix glue_convert_examples_to_features API breakage (#3742) 2020-04-10 16:03:27 -04:00
b7cf9f43d2 Update tokenizers to 0.7.0-rc5 (#3705) 2020-04-10 14:23:49 -04:00
551b450527 Add run_glue_tpu.py that trains models on TPUs (#3702)
* Initial commit to get BERT + run_glue.py on TPU

* Add README section for TPU and address comments.

* Cleanup TPU bits from run_glue.py (#3)

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* Cleanup TPU bits from run_glue.py

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* No need to call `xm.mark_step()` explicitly (#4)

Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.

* Resolve R/W conflicts from multiprocessing (#5)

* Add XLNet in list of models for `run_glue_tpu.py` (#6)

* Add RoBERTa to list of models in TPU GLUE (#7)

* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)

* Use barriers to reduce duplicate work/resources (#9)

* Shard eval dataset and aggregate eval metrics (#10)

* Shard eval dataset and aggregate eval metrics

Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.

* Change defaultdict to float

* Reduce the pred, label tensors instead of metrics

As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.

* Only use tb_writer from master (#11)

* Apply huggingface black code formatting

* Style

* Remove `--do_lower_case` as example uses cased

* Add option to specify tensorboard logdir

This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.

* Using configuration for `xla_device`

* Prefix TPU specific comments.

* num_cores clarification and namespace eval metrics

* Cache features file under `args.cache_dir`

Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.

* Rename `run_glue_tpu` to `run_tpu_glue`

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-04-10 12:53:54 -04:00
cbad305ce6 [docs] The use of do_lower_case in scripts is on its way to deprecation (#3738) 2020-04-10 12:34:04 -04:00
b169ac9c2b [examples] Generate argparsers from type hints on dataclasses (#3669)
* [examples] Generate argparsers from type hints on dataclasses

* [HfArgumentParser] way simpler API

* Restore run_language_modeling.py for easier diff

* [HfArgumentParser] final tweaks from code review
2020-04-10 12:21:58 -04:00
7a7fdf71f8 Multilingual BART - (#3602)
- support mbart-en-ro weights
- add MBartTokenizer
2020-04-10 11:25:39 -04:00
f98d0ef2a2 Big cleanup of glue_convert_examples_to_features (#3688)
* Big cleanup of `glue_convert_examples_to_features`

* Use batch_encode_plus

* Cleaner wrapping of glue_convert_examples_to_features for TF

@lysandrejik

* Cleanup syntax, thanks to @mfuntowicz

* Raise explicit error in case of user error
2020-04-10 10:20:18 -04:00
ce2298fb5f [T5, generation] Add decoder caching for T5 (#3682)
* initial commit to add decoder caching for T5

* better naming for caching

* finish T5 decoder caching

* correct test

* added extensive past testing for T5

* clean files

* make tests cleaner

* improve docstring

* improve docstring

* better reorder cache

* make style

* Update src/transformers/modeling_t5.py

Co-Authored-By: Yacine Jernite <yjernite@users.noreply.github.com>

* make set output past work for all layers

* improve docstring

* improve docstring

Co-authored-by: Yacine Jernite <yjernite@users.noreply.github.com>
2020-04-10 01:02:50 +02:00
9384e5f6de Fix force_download of files on Windows (#3697) 2020-04-09 14:44:57 -04:00
bc65afc4df [Exbert] Change style of button 2020-04-09 10:44:42 -04:00
31baeed614 Update quotes
cc @julien-c
2020-04-09 09:09:00 -04:00
f8208fa456 Correct transformers-cli env call 2020-04-09 09:03:19 +02:00
6435b9f908 Updating the TensorFlow models to work as expected with tokenizers v3.0.0 (#3684)
* Updating modeling tf files; adding tests

* Merge `encode_plus` and `batch_encode_plus`
2020-04-08 16:22:44 -04:00
500aa12318 close #3699 2020-04-08 14:32:47 -04:00
a594ee9c84 More doc for model cards (#3698)
see https://github.com/huggingface/transformers/pull/3679#pullrequestreview-389368270
2020-04-08 12:12:52 -04:00
83703cd077 Update doc for {Summarization,Translation}Pipeline and other tweaks 2020-04-08 09:45:00 -04:00
a1b3b4167e Created README.md for model card ChemBERTa (#3666)
* created readme.md

* update readme with fixes

Fixes from PR comments
2020-04-08 09:10:20 -04:00
747907dc5e Fix typo in FeatureExtractionPipeline docstring 2020-04-08 09:08:56 -04:00
715aa5b135 [Bart] Replace config.output_past with use_cache kwarg (#3632) 2020-04-07 19:08:26 -04:00
e344e3d402 [examples] SummarizationDataset cleanup (#3451) 2020-04-07 19:05:58 -04:00
b0ad069517 [Tokenization] fix edge case for bert tokenization (#3517)
* fix egde gase for bert tokenization

* add Lysandres comments for improvement

* use new is_pretokenized_flag
2020-04-07 16:26:31 -04:00
80fa0f7812 [Examples, Benchmark] Improve benchmark utils (#3674)
* improve and add features to benchmark utils

* update benchmark style

* remove output files
2020-04-07 16:25:57 -04:00
05deb52dc1 Optimize causal mask using torch.where (#2715)
* Optimize causal mask using torch.where

Instead of multiplying by 1.0 float mask, use torch.where with a bool mask for increased performance.

* Maintain compatiblity with torch 1.0.0 - thanks for PR feedback

* Fix typo

* reformat line for CI
2020-04-07 22:19:18 +02:00
0a4b1068e1 Speedup torch summarization tests (#3663) 2020-04-07 14:01:30 -04:00
5aa8a278a3 Fix roberta checkpoint conversion script (#3642) 2020-04-07 12:03:23 -04:00
11cc1e168b [model_cards] Turn down spurious warnings
Close #3639 + spurious warning mentioned in #3227

cc @lysandrejik @thomwolf
2020-04-07 10:20:19 -04:00
0a9d09b42a fixed TransfoXLLMHeadModel documentation (#3661)
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-04-07 00:47:51 +02:00
96ab75b8dd Tokenizers v3.0.0 (#3185)
* Renamed num_added_tokens to num_special_tokens_to_add

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Cherry-Pick: Partially fix space only input without special tokens added to the output #3091

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make fast tokenizers unittests work on Windows.

* Entirely refactored unittest for tokenizers fast.

* Remove ABC class for CommonFastTokenizerTest

* Added embeded_special_tokens tests from allenai @dirkgr

* Make embeded_special_tokens tests from allenai more generic

* Uniformize vocab_size as a property for both Fast and normal tokenizers

* Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)

* Ensure providing None input raise the same ValueError than Python tokenizer + tests.

* Fix invalid input for assert_padding when testing batch_encode_plus

* Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.

* Ensure tokenize() correctly forward add_special_tokens to rust.

* Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
Avoid stripping on None values.

* unittests ensure tokenize() also throws a ValueError if provided None

* Added add_special_tokens unittest for all supported models.

* Style

* Make sure TransfoXL test run only if PyTorch is provided.

* Split up tokenizers tests for each model type.

* Fix invalid unittest with new tokenizers API.

* Filter out Roberta openai detector models from unittests.

* Introduce BatchEncoding on fast tokenizers path.

This new structure exposes all the mappings retrieved from Rust.
It also keeps the current behavior with model forward.

* Introduce BatchEncoding on slow tokenizers path.

Backward compatibility.

* Improve error message on BatchEncoding for slow path

* Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.

* Style and format.

* Added typing on all methods for PretrainedTokenizerFast

* Style and format

* Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.

* Style and format

* encode_plus now supports pretokenized inputs.

* Remove user warning about add_special_tokens when working on pretokenized inputs.

* Always go through the post processor.

* Added support for pretokenized input pairs on encode_plus

* Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.

* Added pretokenized inputs support on batch_encode_plus

* Update BatchEncoding methods name to match Encoding.

* Bump setup.py tokenizers dependency to 0.7.0rc1

* Remove unused parameters in BertTokenizerFast

* Make sure Roberta returns token_type_ids for unittests.

* Added missing typings

* Update add_tokens prototype to match tokenizers side and allow AddedToken

* Bumping tokenizers to 0.7.0rc2

* Added documentation for BatchEncoding

* Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.

* Added higher-level typing for tokenize / encode_plus / batch_encode_plus.

* Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.

* Fix text-classification pipeline using the wrong tokenizer

* Make pipelines works with BatchEncoding

* Turn off add_special_tokens on tokenize by default.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove add_prefix_space from tokenize call in unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style and quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Correct message for batch_encode_plus none input exception.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid list comprehension for offset_mapping overriding content every iteration.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* TransfoXL uses Strip normalizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.7.0rc3

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Support AddedTokens for special_tokens and use left stripping on mask for Roberta.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* SpecilaTokenMixin can use slots to faster access to underlying attributes.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove update_special_tokens from fast tokenizers.

* Ensure TransfoXL unittests are run only when torch is available.

* Style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

* Style 🙏🙏

* Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.

* Remove Roberta warning on __init__.

* Move documentation to Google style.

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-04-07 00:29:15 +02:00
e52d1258e0 Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (#3631)
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py

`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.

* Simplifying change to match recent commits
2020-04-06 16:52:22 -04:00
0ac33ddd8d Create README.md 2020-04-06 16:35:29 -04:00
326e6ebae7 Add model card 2020-04-06 16:30:01 -04:00
43eca3f878 Add model card 2020-04-06 16:29:51 -04:00
6bec88ca42 Create README.md 2020-04-06 16:29:44 -04:00
769b60f935 Add model card (#3655)
* Add model card

* Fix model name in fine-tuning script
2020-04-06 16:29:36 -04:00
c4bcb01906 Create model card (#3654)
* Create model card

* Fix model name in fine-tuning script
2020-04-06 16:29:25 -04:00
6903a987b8 Create README.md 2020-04-06 16:29:02 -04:00
760872dbde Create README.md (#3662) 2020-04-06 16:27:50 -04:00
47e1334c0b Add model card for BERTeus (#3649)
* Add model card for BERTeus

* Update README
2020-04-06 16:21:25 -04:00
529534dc2f BioMed Roberta-Base (AllenAI) (#3643)
* added model card

* updated README

* updated README

* updated README

* added evals

* removed pico eval

* Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-06 16:12:09 -04:00
261c4ff4e2 Update notebooks (#3620)
* Update notebooks

* From local to global link

* from local links to *actual* global links
2020-04-06 14:32:39 -04:00
39a34cc375 [model_cards] ELECTRA (w/ examples of usage)
Co-Authored-By: Kevin Clark <clarkkev@users.noreply.github.com>
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-04-06 11:43:33 -04:00
ea6dba2787 Re-pin isort 2020-04-06 10:09:54 -04:00
11c3257a18 unpin isort for pypi 2020-04-06 10:06:41 -04:00
36bffc81b3 Release: v2.8.0 2020-04-06 10:03:53 -04:00
2ee410560e [Generate, Test] Split generate test function into beam search, no beam search (#3601)
* split beam search and no beam search test

* fix test

* clean generate tests
2020-04-06 10:37:05 +02:00
1789c7daf1 fix argument order (#3637) 2020-04-05 12:33:41 +02:00
b809d2f073 Fix TF T5 docstring (#3636) 2020-04-05 12:23:09 +02:00
4ab8ab4f50 Adjust model card to reflect changes to vocabulary
(cherry picked from commit 8e25c4bf2838211378db4d93e7f9722386cc1a04)
2020-04-04 15:27:41 -04:00
ac40eed1a5 Create README.md
adding readme for 
ktrapeznikov/albert-xlarge-v2-squad-v2
2020-04-04 15:18:54 -04:00
fd9995ebc5 Create README.md 2020-04-04 15:18:31 -04:00
5d912e7ed4 Tweak typing for #3566 2020-04-04 15:04:03 -04:00
94eb68d742 weigths*weights 2020-04-04 15:03:26 -04:00
243e687be6 Create model card 2020-04-04 08:20:34 -04:00
3e4b4dd190 [model_cards] Link to ExBERT visualisation
Hat/tip @bhoov @HendrikStrobelt @sebastianGehrmann

Also cc @srush and @thomwolf
2020-04-03 20:03:29 -04:00
c6acd246ec Speed up GELU computation with torch.jit (#2988)
* Compile gelu_new with torchscript

* Compile _gelu_python with torchscript

* Wrap gelu_new with torch.jit for torch>=1.4
2020-04-03 15:20:21 -04:00
d5d7d88612 ELECTRA (#3257)
* Electra wip

* helpers

* Electra wip

* Electra v1

* ELECTRA may be saved/loaded

* Generator & Discriminator

* Embedding size instead of halving the hidden size

* ELECTRA Tokenizer

* Revert BERT helpers

* ELECTRA Conversion script

* Archive maps

* PyTorch tests

* Start fixing tests

* Tests pass

* Same configuration for both models

* Compatible with base + large

* Simplification + weight tying

* Archives

* Auto + Renaming to standard names

* ELECTRA is uncased

* Tests

* Slight API changes

* Update tests

* wip

* ElectraForTokenClassification

* temp

* Simpler arch + tests

Removed ElectraForPreTraining which will be in a script

* Conversion script

* Auto model

* Update links to S3

* Split ElectraForPreTraining and ElectraForTokenClassification

* Actually test PreTraining model

* Remove num_labels from configuration

* wip

* wip

* From discriminator and generator to electra

* Slight API changes

* Better naming

* TensorFlow ELECTRA tests

* Accurate conversion script

* Added to conversion script

* Fast ELECTRA tokenizer

* Style

* Add ELECTRA to README

* Modeling Pytorch Doc + Real style

* TF Docs

* Docs

* Correct links

* Correct model intialized

* random fixes

* style

* Addressing Patrick's and Sam's comments

* Correct links in docs
2020-04-03 14:10:54 -04:00
8594dd80dd BertJapaneseTokenizer accept options for mecab (#3566)
* BertJapaneseTokenizer accept options for mecab

* black

* fix mecab_option to Option[str]
2020-04-03 11:12:19 -04:00
216e167ce6 Added albert-base-bahasa-cased README and fixed tiny-bert-bahasa-cased README (#3613)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base
2020-04-03 09:28:43 -04:00
1ac6a246d8 Update README.md (#3604)
Update AutoModel & AutoTokernizer loading.
2020-04-03 09:28:25 -04:00
e91692f4a3 Update README.md (#3603) 2020-04-03 09:27:57 -04:00
8e287d507d corrected mistake in polish model cards (#3611)
* added model_cards for polish squad models

* corrected mistake in polish design cards

Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
2020-04-03 09:07:15 -04:00
81484b447b Create README.md (#3568)
* Create README.md

* added meta block (language: german)

* Added additional information about test data
2020-04-02 21:48:31 -04:00
9f6349aba9 Create README.md 2020-04-02 21:43:12 -04:00
ddb1ce7418 added model_cards for polish squad models 2020-04-02 21:40:16 -04:00
f68d22850c delete bogus print statement (#3595) 2020-04-02 21:49:34 +02:00
c50aa67bff Resizing embedding matrix before sending it to the optimizer. (#3532)
* Resizing embedding matrix after sending it to the optimizer prevents from updating the newly resized matrix.

* Remove space for style matter
2020-04-02 15:00:05 -04:00
1b10159950 Adding should_continue check for retraining (#3509) 2020-04-02 14:07:08 -04:00
390c128592 [Encoder-Decoder] Force models outputs to always have batch_size as their first dim (#3536)
* solve conflicts

* improve comments
2020-04-02 15:18:33 +02:00
ab5d06a094 [T5, examples] replace heavy t5 models with tiny random models (#3556)
* replace heavy t5 models with tiny random models as was done by sshleifer

* fix isort
2020-04-02 12:34:05 +02:00
a4ee4da18a [T5, TF 2.2] change tf t5 argument naming (#3547)
* change tf t5 argument naming for TF 2.2

* correct bug in testing
2020-04-01 22:04:20 +02:00
06dd597552 fix bug in warnings T5 pipelines (#3545) 2020-04-01 21:59:12 +02:00
9de9ceb6c5 Correct output shape for Bert NSP models in docs (#3482) 2020-04-01 15:04:38 -04:00
b815edf69f [T5, Testst] Add extensive hard-coded integration tests and make sure PT and TF give equal results (#3550)
* add some t5 integration tests

* finish summarization and translation integration tests for T5 - results loook good

* add tf test

* fix == vs is bug

* fix tf beam search error and make tf t5 tests pass
2020-04-01 18:01:33 +02:00
8538ce9044 Add tiny-bert-bahasa-cased model card (#3567)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme
2020-04-01 07:15:00 -04:00
c1a6252be1 Create model card (#3557)
Create model card for: distilbert-multi-finetuned-for-xqua-on-tydiqa
2020-04-01 07:14:23 -04:00
50e15c825c Tokenizers: Start cleaning examples a little (#3455)
* Start cleaning examples

* Fixup
2020-04-01 07:13:40 -04:00
b38d552a92 [Generate] Add bad words list argument to the generate function (#3367)
* add bad words list

* make style

* add bad_words_tokens

* make style

* better naming

* make style

* fix typo
2020-03-31 18:42:31 +02:00
ae6834e028 [Examples] Clean summarization and translation example testing files for T5 and Bart (#3514)
* fix conflicts

* add model size argument to summarization

* correct wrong import

* fix isort

* correct imports

* other isort make style

* make style
2020-03-31 17:54:13 +02:00
0373b60c4c Update README.md (#3552)
- Show that the last uploaded version was trained on more data (custom_license files)
2020-03-31 10:40:34 -04:00
83d1fbcff6 [Docs] Add usage examples for translation and summarization (#3538) 2020-03-31 09:36:03 -04:00
55bcae7f25 remove useless and confusing lm_labels line (#3531) 2020-03-31 09:32:25 -04:00
42e1e3c67f Update usage doc regarding generate fn (#3504) 2020-03-31 09:31:46 -04:00
57b0fab692 Add better explanation to check docs locally. (#3459) 2020-03-31 09:30:17 -04:00
a8d4dff0a1 Update README.md (#3470)
Fix typo
2020-03-31 08:01:09 -04:00
4a5663568f Create card for the model: GPT-2-finetuned-covid-bio-medrxiv (#3453) 2020-03-31 08:01:03 -04:00
bbedb59675 Create README.md (#3393)
* Create README.md

* Update README.md
2020-03-31 08:00:35 -04:00
c2cf192943 Add link to 16 POS tags model (#3465) 2020-03-31 08:00:00 -04:00
c82ef72158 Added CovidBERT-NLI model card (#3477) 2020-03-31 07:59:49 -04:00
b48a1f08c1 Add text shown in example of usage (#3464) 2020-03-31 07:59:36 -04:00
99833a9cbf Create model card (#3487) 2020-03-31 07:59:22 -04:00
ebceeeacda Add electra and alectra model cards (#3524) 2020-03-31 07:58:48 -04:00
a6c4ee27fd Add model cards (#3537)
* feat: add model card bert-imdb

* feat: add model card gpt2-imdb-pos

* feat: add model card gpt2-imdb
2020-03-31 07:54:45 -04:00
e5c393dceb [Bug fix] Using loaded checkpoint with --do_predict (instead of… (#3437)
* Using loaded checkpoint with --do_predict

Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the `model` variable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).

* Update checkpoint loading

* Fixing model loading
2020-03-30 17:06:08 -04:00
8deff3acf2 [bart-tiny-random] Put a 5MB model on S3 to allow faster exampl… (#3488) 2020-03-30 12:28:27 -04:00
1f72865726 [BART] Update encoder and decoder on set_input_embedding (#3501)
Co-authored-by: Ioannis Douratsos <ioannisd@amazon.com>
2020-03-30 12:20:37 -04:00
cc598b312b [InputExample] Unfreeze for now, cf. #3423 2020-03-30 10:41:49 -04:00
d38bbb225f Update the NER TF script (#3511)
* Update the NER TF script to remove the softmax and make the pad token label id to -1

* Reformat the quality and style

Co-authored-by: Julien Plu <julien.plu@adevinta.com>
2020-03-30 09:50:12 -04:00
eff757f2e3 Re-pin isort version 2020-03-30 09:00:47 -04:00
a009d751c2 Un-pin isort for v2.7.0 pypi 2020-03-30 08:55:10 -04:00
6f5a12a583 Release: v2.7.0 2020-03-30 08:49:24 -04:00
296252c49e fix lm lables in docstring (#3529) 2020-03-30 14:26:24 +02:00
75ec6c9e3a [T5] make decoder input ids optional for t5 training (#3521)
* make decoder input ids optional for t5 training

* lm_lables should not be shifted in t5

* add tests

* finish shift right functionality for PT T5

* move shift right to correct class

* cleaner code

* replace -100 values with pad token id

* add assert statement

* remove unnecessary for loop

* make style
2020-03-30 13:45:26 +02:00
5b44e0a31b [T5] Add training documenation (#3507)
* Add clear description of how to train T5

* correct docstring in T5

* correct typo

* correct docstring format

* update t5 model docs

* implement collins feedback

* fix typo and add more explanation for sentinal tokens

* delete unnecessary todos
2020-03-30 13:35:53 +02:00
33ef7002e1 [Docs] examples/summarization/bart: Simplify CNN/DM preprocessi… (#3516) 2020-03-29 13:25:42 -04:00
f6a23d1911 [BART] add bart-large-xsum weights (#3422) 2020-03-29 10:51:13 -04:00
601ac5b1dc [model_cards]: use MIT license for all dbmdz models 2020-03-27 18:06:25 -04:00
17dceae7a1 Fix circle ci flaky fail of wmt example (#3485)
* force bleu

* fix wrong file name

* rename file

* different filenames for each example test

* test files should clean up after themselves

* test files should clean up after themselves

* do not force bleu

* correct typo

* fix isort
2020-03-27 13:01:28 -04:00
00ea100e96 add summarization and translation to notebook (#3478) 2020-03-27 11:05:37 -04:00
b08259a120 run_ner.py / bert-base-multilingual-cased can output empty tokens (#2991)
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.

This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-27 10:59:55 -04:00
f4f4946836 Rename t5-large to t5-base in README.md 2020-03-27 15:57:58 +01:00
fa9af2468a Add T5 to docs (#3461)
* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs
2020-03-27 10:57:16 -04:00
ff80b73157 Add option to choose T5 model size. (#3480)
T5-small in test


isort
2020-03-27 15:56:59 +01:00
e2c05f06ef Correct indentation in docstring
For some reason Sphinx extremely dislikes this and crashes.
2020-03-27 09:28:52 -04:00
3ee431dd4c [Bart/Memory] Two separate, smaller decoder attention masks (#3371) 2020-03-26 21:34:15 -04:00
53fe733805 Model Cards: Fix grammar error (#3467) 2020-03-26 21:33:33 -04:00
c10decf7a0 [Bart: example] drop columns that are exclusively pad_token_id… (#3400)
* trim seq_len below 1024 if there are columns full of pad_token_id
* Centralize trim_batch so SummarizationDataset can use it too
2020-03-26 19:33:54 -04:00
63f4d8cad0 [Bart/Memory] SelfAttention only returns weights if config.outp… (#3369) 2020-03-26 18:42:39 -04:00
2b2a2f8df2 [Bart] Fix: put dummy_inputs on correct device (#3398)
* Dummy inputs to model.device

* Move self.device to ModuleUtilsMixin
2020-03-26 18:42:09 -04:00
1a5aefc95c [Seq2Seq Generation] Call encoder before expanding input_ids (#3370) 2020-03-26 18:41:19 -04:00
39371ee454 [Bart/Memory] don't create lm_head (#3323)
* delete lm_head, skips weight tying
* Fixed s3
2020-03-26 18:40:39 -04:00
5ad2ea06af Add wmt translation example (#3428)
* add translation example

* make style

* adapt docstring

* add gpu device as input for example

* small renaming

* better README
2020-03-26 19:07:59 +01:00
b4fb94fe6d revert unpin isort commit 2020-03-26 13:19:18 -04:00
e703e923ca Add t5 summarization example (#3411)
* rebase to master

* change tf to pytorch

* change to pytorch

* small fix

* renaming

* add gpu training possibility

* renaming

* improve README

* incoorporate collins feedback

* better Readme

* better README.md
2020-03-26 18:17:55 +01:00
1a6c546c6f Add missing token classification for XLM (#3277)
* Add the missing token classification for XLM

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add the missing token classification for XLM

* fix styling

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add missing description for AlbertForTokenClassification

* fix styling

* Add missing docstring for AlBert

* Slow tests should be slow

Co-authored-by: Sakares Saengkaew <s.sakares@gmail.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-03-26 10:22:13 -04:00
311970546f rename string in pipeline 2020-03-26 14:59:49 +01:00
7420a6a9cc Create card for model GPT-2-finetuned-CORD19 2020-03-26 09:10:09 -04:00
022e8fab97 Adds translation pipeline (#3419)
* fix merge conflicts

* add t5 summarization example

* change parameters for t5 summarization

* make style

* add first code snippet for translation

* only add prefixes

* add prefix patterns

* make style

* renaming

* fix conflicts

* remove unused patterns

* solve conflicts

* fix merge conflicts

* remove translation example

* remove summarization example

* make sure tensors are in numpy for float comparsion

* re-add t5 config

* fix t5 import config typo

* make style

* remove unused numpy statements

* update doctstring

* import translation pipeline
2020-03-26 13:50:58 +01:00
3c5c567507 Update model card huseinzol05/bert-base-bahasa-cased (#3425)
* add bert bahasa readme

* update readme

* update readme

* added xlnet
2020-03-26 07:50:27 -04:00
9c683ef01e Add t5 to pipeline(task='summarization') (#3413)
* solve conflicts

* move warnings below

* incorporate changes

* add pad_to_max_length to pipelines

* add bug fix for T5 beam search

* add prefix patterns

* make style

* fix conflicts

* adapt pipelines for task specific parameters

* improve docstring

* remove unused patterns
2020-03-26 11:03:13 +01:00
ffcffebe85 Force the return of token type IDs (#3439) 2020-03-26 09:41:36 +01:00
010e0460b2 Updated/added model cards (#3435) 2020-03-25 16:40:03 -04:00
ffa17fe322 Extend config with task specific configs. (#3433)
* add new default configs

* change prefix default to None
2020-03-25 21:32:04 +01:00
83272a3853 Experiment w/ dataclasses (including Py36) (#3423)
* [ci] Also run test_examples in py37

(will revert at the end of the experiment)

* InputExample: use immutable dataclass

* [deps] Install dataclasses for Py<3.7

* [skip ci] Revert "[ci] Also run test_examples in py37"

This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
2020-03-25 11:10:20 -04:00
ccbe839ee0 Added BioBERT-NLI model card (#3421) 2020-03-24 21:15:55 -04:00
3d76df3a12 BART for summarization training with CNN/DM using pytorch-lightning 2020-03-24 21:00:24 -04:00
eaabaaf750 [run_language_modeling] Fix: initialize a new model from a config object 2020-03-24 17:56:40 -04:00
f8823bad9a Expose missing mappings (see #3415) 2020-03-24 17:46:25 -04:00
d0c36a7b72 [ci] Partial revert of 18eec3a9847 due to fbc5bf10cfe 2020-03-24 12:10:43 -04:00
fbc5bf10cf v2.6.0 release: isort un-pinned 2020-03-24 11:52:02 -04:00
b88bda6af3 Add right model and tokenizer path in example 2020-03-24 11:30:12 -04:00
b31ef225cf [model_cards] 🇹🇷 Add new (uncased, 128k) BERTurk model 2020-03-24 11:29:06 -04:00
b4009cb001 [model_cards] 🇹🇷 Add new (cased, 128k) BERTurk model 2020-03-24 11:29:06 -04:00
d3283490ef [model_cards] 🇹🇷 Add new (uncased) BERTurk model 2020-03-24 11:29:06 -04:00
e279a312d6 Model cards for CS224n SQuAD2.0 models (#3406)
* Model cards for CS224n SQuAD2.0 models

* consistent spacing
2020-03-24 11:28:33 -04:00
7372e62b2c Added precisions in SciBERT-NLI model card (#3410) 2020-03-24 11:01:56 -04:00
471cce24b3 Release: v2.6.0 2020-03-24 10:37:32 -04:00
e392ba6938 Add camembert integration tests (#3375)
* add integration tests for camembert

* use jplu/tf-camembert fro the moment

* make style
2020-03-24 10:18:37 +01:00
a8e3336a85 [examples] Use AutoModels in more examples 2020-03-23 20:11:14 -04:00
ec6766a363 [deps] scikit-learn's transient issue was fixed 2020-03-23 18:38:09 -04:00
f7dcf8fcea [BertAbs] Move files around for more consistent naming 2020-03-23 13:58:49 -04:00
e25c4f4027 [ALBERT] move things around for more consistent naming
see #3359

cc @lysandrejik
2020-03-23 13:58:21 -04:00
85b324bee5 Add comparison table with older brother in family 2020-03-23 12:11:20 -04:00
b7aa077a63 Create card for the model 2020-03-23 12:10:41 -04:00
f740177c87 Add comparison table with new models 2020-03-23 12:10:23 -04:00
e52482909b Correct order for dev/quality dependencies
cc @julien-c
2020-03-23 12:01:23 -04:00
28424906c2 Added scibert-nli model card 2020-03-23 11:55:41 -04:00
18eec3a984 [ci] simpler way to load correct version of isort
hat/tip @bramvanroy
2020-03-23 10:03:22 -04:00
cf72479bf1 One last reorder of {scheduler,optimizer}.step() 2020-03-20 18:05:50 -04:00
634bf6cf7e fixes lr_scheduler warning
For more details, see https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2020-03-20 18:03:50 -04:00
265709f5cd New model, new model cards 2020-03-20 18:01:01 -04:00
115abd2166 Handle pinned version of isort
The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.
2020-03-20 18:00:04 -04:00
95e00d0808 Clean special token init in modeling_....py (#3264)
* make style

* fix conflicts
2020-03-20 21:41:04 +01:00
8becb73293 removing torch.cuda.empty_cache() from TF function (#3267)
torch.cuda.empty_cache() was being called from a TF function (even when torch is unavailable)
not sure any replacement is needed if TF OOMs
2020-03-19 23:25:30 +01:00
ecfd336318 Simpler Error message when loading config/model with .from_pretrained() (#3341) 2020-03-19 23:23:03 +01:00
8eeefcb576 Update 01-training-tokenizers.ipynb (typo issue) (#3343)
I found there are two grammar errors or typo issues in the explanation of the encoding properties.

The original sentences:
If your was made of multiple \"parts\" such as (question, context), then this would be a vector with for each token the segment it belongs to
If your has been truncated into multiple subparts because of a length limit (for BERT for example the sequence length is limited to 512), this will contain all the remaining overflowing parts.

I think "input" should be inserted after the phrase "If your".
2020-03-19 23:21:49 +01:00
bbf26c4e61 Support T5 Generation (#3228)
* fix conflicts

* update bart max length test

* correct spelling mistakes

* implemented model specific encode function

* fix merge conflicts

* better naming

* save intermediate state -> need to rethink strucuture a bit

* leave tf problem as it is for now

* current version

* add layers.pop

* remove ipdb

* make style

* clean return cut decoding

* remove ipdbs

* Fix restoring layers in the decoders that doesnt exists.

* push good intermediate solution for now

* fix conflicts

* always good to refuse to merge conflicts when rebasing

* fix small bug

* improve function calls

* remove unused file

* add correct scope behavior for t5_generate

Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-03-19 23:18:23 +01:00
656e1386a2 Fix #3305: run_ner only possible on ModelForTokenClassification models 2020-03-19 16:41:28 -04:00
0c44b11917 add bert bahasa readme 2020-03-19 15:08:19 -04:00
e99af3b17b Create model card for bert-small-finetuned-squadv2 2020-03-19 15:07:55 -04:00
39db055268 Merge pull request #3348 from mrm8488/patch-28
Create card for BERT-Mini finetuned on SQuAD v2
2020-03-19 15:07:39 -04:00
dedc7a8fdb Create card for BERT-Tiny fine-tuned on SQuAD v2
- Only 17MB of Model weights!!
2020-03-19 15:07:22 -04:00
676adf8625 Created card for spanbert-finetuned-squadv1 2020-03-19 15:06:35 -04:00
11d8bcc9d7 Add model cards for FinBERT. (#3331)
* Add a model card for FinBERT

This is a copy of https://github.com/TurkuNLP/FinBERT/blob/master/README.md.

* Added a file for uncased.

* Add metadata for cased.

* Added metadata for uncased.
2020-03-19 15:06:01 -04:00
f049be7ad4 Export ALBERT main layer in TensorFlow (#3354) 2020-03-19 13:53:05 -04:00
3bedfd3347 Fix wrong link for the notebook file (#3344)
For the tutorial of "How to generate text", the URL link was wrong (it was linked to the tutorial of "How to train a language model").

I fixed the URL.
2020-03-19 17:22:47 +01:00
b2c2c31c60 Minor Bug Fix for Running Roberta on Glue (#3240)
* added return_token_type_ids argument for tokenizers which do not generate return_type_ids by default

* fixed styling

* Style

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-03-19 12:08:31 -04:00
4e4403c9b4 [BART] torch 1.0 compatibility (#3322)
* config.activation_function
2020-03-19 11:56:54 -04:00
c44a17db1b [FIX] not training when epoch is small (#3006)
* solving bug where for small epochs and large gradient_accumulation_steps we never train

* black formatting

* no need to change these files
2020-03-19 11:21:21 -04:00
ad7233fc01 [BART] cleanup: remove redundant kwargs, improve docstrings (#3319) 2020-03-19 11:16:51 -04:00
cd21d8bc00 Typo in warning message (#3219)
`T5Tokenizer` instead of `XLNetTokenizer`
2020-03-19 09:49:25 -04:00
8d3e218ea6 fix typo in docstring demonstrating usage (#3213) 2020-03-19 09:47:54 -04:00
cec3cdda15 Fix input ids can be none attn mask (#3345)
* fix issue 3289

* fix attention mask if input_ids None behavior
2020-03-19 09:55:17 +01:00
f6d813aaaa Create README.md 2020-03-18 23:45:02 -04:00
939328111b Create README.md
roberta_chinese_base card
2020-03-18 23:44:12 -04:00
29442d2edf Create README.md
albert_chinese_tiny card
2020-03-18 23:43:49 -04:00
20139b7c8d Added model cards for SciBERT models uploaded under AllenAI org (#3330)
* Create README.md

* model card

* add model card for cased
2020-03-18 15:45:11 -04:00
cae334c43c Improve fill-mask pipeline example in 03-pipelines notebook.
Remove hardcoded mask_token and use the value provided by the tokenizer.
2020-03-18 17:11:42 +01:00
4b1970bb4c Create README.md 2020-03-18 11:37:17 -04:00
d6afbd323d XLM-R Tokenizer now passes common tests + Integration tests (#3198)
* XLM-R now passes common tests + Integration tests

* Correct mask index

* Model input names

* Style

* Remove text preprocessing

* Unneccessary import
2020-03-18 09:52:49 -04:00
292186a3e7 Adding LM Head to Transfo-XL and first step to fixing problem with Adaptive Embeddings in TransfoXL (#3286)
* first commit

* work in progress

* make language generation task pass

* update to working version for LM

* delete print

* remove dead code

* make style
2020-03-18 09:24:27 -04:00
efdb46b6e2 add link to blog post (#3326) 2020-03-18 13:24:28 +01:00
ddb10c6447 improve doctstring (#3327) 2020-03-18 13:24:09 +01:00
d7f98cd3ef Init card for model 2020-03-18 07:55:27 -04:00
38a555a83c Add Summarization to Pipelines (#3128)
* passing

* Undo stupid chg

* docs

* undo rename

* delete-cruft

* only import if you have torch

* Dont rely on dict ordering

* Fix dict ordering upstream

* docstring link

* docstring link

* remove trailing comma for 3.5 compat

* new name

* delegate kwarging

* Update kwargs
2020-03-17 18:04:21 -04:00
2b60a26b46 Update examples/ner/run_ner.py to use AutoModel (#3305)
* Update examples/ner/run_ner.py to use AutoModel

* Fix missing code and apply `make style` command
2020-03-17 12:30:10 -04:00
e41212c715 Create model card for CodeBERTaPy (#3309) 2020-03-17 12:29:11 -04:00
0f1bc0d68e [model_cards] Add google thumbnail 2020-03-17 12:02:51 -04:00
930c9412b4 [WIP] Lightning glue example (#3290)
*  Alter base pl transformer to use automodels

* 🐛 Add batch size env variable to function call

* 💄 Apply black code style from Makefile

* 🚚 Move lightning base out of ner directory

*  Add lightning glue example

* 💄 self

* move _feature_file to base class

*  Move eval logging to custom callback

* 💄 Apply black code style

* 🐛 Add parent to pythonpath, remove copy command

* 🐛 Add missing max_length kwarg
2020-03-17 11:46:42 -04:00
e8f44af5bf [generate] do_sample default back to False (#3298)
* change do_samples back

* None better default as boolean

* adapt do_sample to True in test example

* make style
2020-03-17 10:52:37 -04:00
2187c49f5c CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186)
* memory benchmark rss

* have both forward pass and line-by-line mem tracing

* cleaned up tracing

* refactored and cleaning up API

* no f-strings yet...

* add GPU mem logging

* fix GPU memory monitoring

* style and quality

* clean up and doc

* update with comments

* Switching to python 3.6+

* fix quality
2020-03-17 10:17:11 -04:00
bd3feddf67 Create README.md (#3306)
* Create README.md

* Updated README.md
2020-03-17 09:05:11 -04:00
68ef0a111f [model_cards] Symlink all Google AI's BERT Miniatures to source model card 2020-03-16 23:37:42 -04:00
b2c1a447fe [BART] Delete redundant unit test (#3302) 2020-03-16 23:09:10 -04:00
b2028cc26b Add model card for Google AI's BERT Miniatures (#3301)
This model card is intended to be shared among all models under google/bert_uncased_*
(We'll need some support from HuggingFace to get this card cross-linked from all models)
2020-03-16 21:51:46 -04:00
4759176313 add camembert for Question answering for examples 2020-03-16 14:42:11 -04:00
11573231c6 [BART] generation_mode as a kwarg not a class attribute (#3278) 2020-03-16 12:47:53 -04:00
de697935a2 Create model card for spanbert-finetuned-squadv2 2020-03-16 12:32:46 -04:00
3ddd2029bc Create CodeBERTaJS model card 2020-03-16 12:23:01 -04:00
879e1d3234 Add TF2 version of FlauBERT (#2700)
* Add TF2 version of FlauBERT

* Add TF2 version of FlauBERT

* Add documentation

* Apply style and quality

* Apply style once again

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-03-16 09:29:21 -04:00
af471ce5e8 Improved Error message when loading config/model with .from_pretrained() (#3247)
* better error message

* better error message

* update to model identifier instead of url

* update to model identifier instead of ur
2020-03-16 09:48:30 +01:00
5ea8ba67b4 [BART] Remove unused kwargs (#3279)
* Remove unused kwargs
* dont call forward in tests
2020-03-15 23:00:44 -04:00
3814e167d9 Merge pull request #3225 from patrickvonplaten/finalize_merge_bart_generate_into_default_generate
Complete merge Seq-2-Seq generation into default generation
2020-03-14 15:08:59 +01:00
2bd79e23de [BART] FP16 testing fixes (#3266) 2020-03-13 19:48:26 -04:00
8320feec09 [model_cards] CodeBERTa 2020-03-13 18:28:09 -04:00
ab756f713c add gpt2-xl for tf 2020-03-13 16:40:35 -04:00
4f75d380a4 make style 2020-03-13 16:35:52 +01:00
c2ee3840ae update file to new starting token logic 2020-03-13 16:34:44 +01:00
cc4c37952a Create camembert-base-README.md 2020-03-13 09:35:53 -04:00
afea70c01c Bump psutil from 5.6.3 to 5.6.6 in /examples/distillation
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.6.3 to 5.6.6.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.6.3...release-5.6.6)

Signed-off-by: dependabot[bot] <support@github.com>
2020-03-12 21:14:56 -04:00
087465b943 add BART to README (#3255) 2020-03-12 19:38:05 -04:00
6a82f774f2 fix typo 2020-03-12 21:10:51 +01:00
f1c71da115 fix eos_token_ids in test 2020-03-12 21:00:54 +01:00
6047f46b19 re-add eos token to get good bart results 2020-03-12 20:17:50 +01:00
c11160114a small clean-up 2020-03-12 20:02:35 +01:00
2e81b9d8d7 Bart: update example for #3140 compatibility (#3233)
* Update bart example docs
2020-03-12 10:36:37 -04:00
72768b6b9c [model_cards] polbert: simplify usage example with pipelines
Co-Authored-By: Darek Kłeczek <darek.kleczek@gmail.com>
2020-03-12 10:05:40 -04:00
a4c75f1492 [ci] last resort 2020-03-11 19:11:19 -04:00
824e320d96 [ci] Fixup c6cf925 2020-03-11 18:52:10 -04:00
c6cf925ff8 [ci] last resort
while looking for fix to https://twitter.com/julien_c/status/1237864185821708291
2020-03-11 18:49:19 -04:00
14e455b716 [model_cards] 🇹🇷 Add new (cased) DistilBERTurk model 2020-03-11 18:40:38 -04:00
f65f74bbce Create README.md (#3230) 2020-03-11 12:37:00 -04:00
324292cfc7 Add Bio+ Clinical BERT model card (#3229)
* Create README.md

* Update README.md
2020-03-11 12:36:33 -04:00
e43afb1bb8 [model_cards] DialoGPT: How to use + thumbnail + conversational tag
cc @dreasysnail

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
2020-03-11 11:36:47 -04:00
5085df995f [model_cards] PolBERT tweaks 2020-03-11 09:29:22 -04:00
19a63d8245 Create Readme.md model card (#3221) 2020-03-11 09:12:48 -04:00
dc848c2994 Create README.md 2020-03-11 09:10:57 -04:00
6ad221daf3 Create README.md 2020-03-11 09:10:23 -04:00
735180aa14 Create README.md 2020-03-11 09:10:13 -04:00
6c61c0801e Create README.md 2020-03-11 09:03:30 -04:00
235616686a Update README.md
- Update title
- Remove metrics
2020-03-11 09:03:20 -04:00
5bb00c817f Update README.md
Change title to clarify the model description
2020-03-11 09:03:07 -04:00
601e424750 Update README.md 2020-03-11 09:02:56 -04:00
1b9e765b21 Update README.md
- Remove metrics until tested on other xquad benchmarks
2020-03-11 09:02:18 -04:00
db29ffc978 Merge pull request #3140 from patrickvonplaten/merge_bart_generate_into_default_generate
Merge bart generate into default generate
2020-03-11 13:21:53 +01:00
ac303eae46 fix problem with half 2020-03-11 12:24:30 +01:00
bc9d5d917c make all tensors half precision 2020-03-11 12:15:38 +01:00
a332cc9f7f finalize generation merge 2020-03-11 11:53:36 +01:00
1ba21f96ca fix bug in tf no_repeat_ngram_size 2020-03-11 11:06:56 +01:00
d997ac7810 fix typo 2020-03-11 11:06:56 +01:00
7351a8dbaf re-add scoring filtering 2020-03-11 11:06:56 +01:00
9b8ee8cea0 delete print and make style 2020-03-11 11:06:56 +01:00
ca1330f0b2 do not mess with the negative sign 2020-03-11 11:06:56 +01:00
10989715d0 rename variable 2020-03-11 11:06:56 +01:00
cf06290565 remove ipdb 2020-03-11 11:06:56 +01:00
374deef48d fixed typo 2020-03-11 11:06:56 +01:00
a2c8e516c2 fix torch to tf translation 2020-03-11 11:06:56 +01:00
ca2047bc35 refactor variable naming and improve tf generate in line with torch generate 2020-03-11 11:06:56 +01:00
41b437ea3a add draft version of propsoed changes for ROGUE score 2020-03-11 11:06:56 +01:00
a5751f7578 fix bug with attention_mask as optional input argument 2020-03-11 11:06:56 +01:00
629aac92ec do not allow do_sample and weird force bos token things 2020-03-11 11:06:56 +01:00
d880a5fbde finalized PR 2020-03-11 11:06:56 +01:00
2acfe63964 best current version and make style 2020-03-11 11:06:56 +01:00
c62444da39 fix conflicts 2020-03-11 11:06:56 +01:00
77e6775065 add current changes 2020-03-11 11:06:56 +01:00
333affcb81 add current changes 2020-03-11 11:06:56 +01:00
421216997b comment out stuff 2020-03-11 11:06:56 +01:00
7a11e925cf work in progress 2020-03-11 11:06:56 +01:00
5b3000d933 renamed min_len to min_length 2020-03-11 11:06:56 +01:00
aceb3fbaf4 only do output_past=True for language generation in bart 2020-03-11 11:06:56 +01:00
7cba11fb9b better naming 2020-03-11 11:06:56 +01:00
ff648221bd fix conflicts 2020-03-11 11:06:56 +01:00
c0d9dd3ba9 refactored code a bit and made more generic 2020-03-11 11:06:56 +01:00
d8e2b3c547 fix conflicts 2020-03-11 11:06:56 +01:00
d6de6423ba [doc] --organization tweak
Co-Authored-By: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-03-10 16:52:44 -04:00
0e56dc3078 [doc] Document the new --organization flag of CLI 2020-03-10 16:42:01 -04:00
270dfa1c8e [dialogpt] conversion script
Reference: https://github.com/huggingface/transformers/pull/1778#issuecomment-567675530

cc @patrickvonplaten and @dreasysnail
2020-03-10 15:09:29 -04:00
2661d80687 Update README.md
- Clarify that the model is not trained on the evaluation dataset
2020-03-10 10:59:34 -04:00
6a13448ad2 Update README.md
- Fix path of tokenizer
- Clarify that the model is not trained on the evaluation set
2020-03-10 10:59:16 -04:00
e57533cca5 Create README.md 2020-03-10 10:58:13 -04:00
31f2437f07 Merge pull request #3191 from patrickvonplaten/add_integration_tests_lm_generate_torch_tf
Add integration tests lm generate torch tf
2020-03-10 11:29:17 +01:00
5ca356a464 NER - pl example (#3180)
* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps

* pl checkpoint callback filenotfound error: make directory and pass

* #3159 pl checkpoint path difference

* 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus

* Updated results in readme

* 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts

* comment length check

* using deprecated validation_end for stable results

* style related changes
2020-03-09 20:43:38 -04:00
f51ba059b9 Model card for albert-base-v2-squad2 2020-03-09 19:37:15 -04:00
cbf8f5d32b [model upload] Support for organizations 2020-03-09 17:33:57 -04:00
525b6b1c54 TFQA pipeline marked as slow test 2020-03-09 16:52:30 -04:00
3aca02efb3 Bart example: model.to(device) (#3194) 2020-03-09 15:09:35 -04:00
5164ea91a7 Skipping outputs (#3116)
* Minimal example

* Proposal 2

* Proposal 2 for fast tokenizers

* Typings

* Docs

* Revert "Docs" for easier review

This reverts commit eaf0f97062e809887704a542144c537f769d5223.

* Remove unnecessary assignments

* Tests

* Fix faulty type

* Remove prints

* return_outputs -> model_input_names

* Revert "Revert "Docs" for easier review"

This reverts commit 6fdc69408102bf695797f2dfddbb6350c6b9e722.

* code quality
2020-03-09 13:48:58 -04:00
49debe62fd Merge pull request #3190 from patrickvonplaten/fix_repetition_penalty_in_tf_generate
fix repetition penalty mask in tf
2020-03-09 16:29:57 +01:00
847d370301 fix typo 2020-03-09 16:18:29 +01:00
eb3e6cb04f cased -> uncased in BERT SQuAD example
closes #3183
2020-03-09 10:54:18 -04:00
9050ffe035 delete w! -> need to be more careful with vim 2020-03-09 15:43:12 +01:00
efb619235c add print statement to avoid code quality problem 2020-03-09 15:31:21 +01:00
b12541c4dc test ctrl 2020-03-09 13:58:01 +00:00
3e624c64ca fix repetition penalty mask in tf 2020-03-09 14:55:11 +01:00
b73dd1a0e4 fix typo in test xlm tf 2020-03-09 11:34:31 +01:00
4620caa864 fix if use lang embeddings in tf xlm 2020-03-09 11:18:54 +01:00
fbd02d4693 fixed all tests, still need to check ctrl tf and pt and xlm tf 2020-03-08 21:45:55 +01:00
b4a3a64744 fix xlnet & transfotests 2020-03-08 16:25:03 +01:00
b29fed790b Updated Tokenw ise in print statement to Token wise 2020-03-08 10:55:30 -04:00
66c827656f fix typo in test gpt2 2020-03-08 15:35:08 +01:00
314bdc7c14 fix typo in test 2020-03-08 15:34:20 +01:00
575976144a updated all tests 2020-03-08 15:29:10 +01:00
e03129ad44 [model_cards] Small formatting fix
cc @mrm8488
2020-03-06 18:07:44 -05:00
08a70fb392 [model_cards] Fixup d6df9a8f 2020-03-06 17:50:38 -05:00
0ae91c80aa Change back pipeline signatures (#3105)
* Change back pipeline signatures

* String types for non-imported objects
2020-03-06 17:26:18 -05:00
d6df9a8ffe [model_cards]Add albert chinese model(tiny,small,base,large,xlarge,xxlarge) 2020-03-06 17:23:31 -05:00
c52716d46c Create README.md 2020-03-06 17:20:19 -05:00
73a0c25376 remove excess line breaks in DeepPavlov model cards 2020-03-06 17:19:35 -05:00
ed37f9fa4f [Bart] _prepare_decoder_inputs should use large negative (#3158) 2020-03-06 16:06:36 -05:00
0416d437fb Merge pull request #3148 from patrickvonplaten/refactoring_beam_search_for_tf_2
refactored beam search according to torch implementation
2020-03-06 22:01:46 +01:00
db9279dedb Fix QA models binding for Flaubert, XLNet and XLM. (#3100)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Format & quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Again.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-06 13:04:29 -05:00
e58b3ec5df add imports to examples (#3160) 2020-03-06 11:15:33 -05:00
6ffe03a0a1 Merge pull request #3137 from tomhosking/bart-refactor
Refactor BartModel so that input checks are handled within enc/dec
2020-03-06 13:06:34 +01:00
3e5da38dae Merge pull request #3132 from huggingface/hf_api_model_list
[hf_api] Get the public list of all the models on huggingface
2020-03-06 13:05:52 +01:00
9499a3778e Merge pull request #3103 from gthb/keras-serialization
Support keras JSON/HDF5 serialization of main layers
2020-03-06 12:59:13 +01:00
9362eb4a07 refactored beam search according to torch implementation 2020-03-06 00:46:29 +01:00
c8035e11e8 Merge pull request #3149 from patrickvonplaten/fix_renaming_error
fix missed BartForMaskedLM renaming
2020-03-06 00:45:08 +01:00
58fc8f97a3 fix renaming problem 2020-03-06 00:35:47 +01:00
857e0a0d3b Rename BartForMaskedLM -> BartForConditionalGeneration (#3114)
* improved documentation
2020-03-05 17:41:18 -05:00
fa2aa699da Merge pull request #3011 from patrickvonplaten/add_models_special_tokens_to_specific_configs
Add models special tokens to its pretrained configs
2020-03-05 17:26:48 -05:00
146c521235 Merge branch 'master' into add_models_special_tokens_to_specific_configs 2020-03-05 17:24:42 -05:00
b623ddc000 Pass kwargs to configuration (#3147)
* Pass kwargs to configuration

* Setter

* test
2020-03-05 17:16:57 -05:00
0001d05686 Correct missing keys + test (#3143) 2020-03-05 17:01:54 -05:00
1741d740f2 Merge pull request #3145 from sshleifer/bartfp16
[Bart] FP16 Support
2020-03-05 22:14:35 +01:00
bbabbc1613 Merge pull request #3135 from patrickvonplaten/refactor_beam_search_generate
Refactoring and bug fixing beam search generate
2020-03-05 22:12:56 +01:00
14d40584b2 remove newline 2020-03-05 13:06:35 -05:00
1360dacaa3 cleanup deltas 2020-03-05 12:57:42 -05:00
810079de1f no ipdb 2020-03-05 12:48:14 -05:00
c203509d5b undo chg 2020-03-05 12:34:08 -05:00
c36fdc88d4 tests pass 2020-03-05 12:33:08 -05:00
7ac47bfe69 Updated notebook dependencies for Colab.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 16:07:51 +01:00
be02176a4b Fixing sentiment pipeline in 03-pipelines notebook.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 16:07:51 +01:00
8a2d9bc9ef Add model cards for DeepPavlov models (#3138)
* add empty model cards for every current DeepPavlov model

* fix: replace cyrillic `с` with `c`

* docs: add model cards for current DeepPavlov BERT models

* docs: add links for arXiv preprints
2020-03-05 09:34:43 -05:00
012cbdb0f5 Updating colab links in notebooks README.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 15:34:15 +01:00
31acb8dc52 Remove rogue .DS_Store 2020-03-05 13:51:30 +00:00
06a6cb6f36 Refactor BartModel so that input checks are handled within BartEncoder and BartDecoder 2020-03-05 13:45:41 +00:00
e33ed12c3b uncomment expression 2020-03-05 13:41:04 +01:00
4220fd52b9 remove ipdb 2020-03-05 13:36:21 +01:00
c47394b0c9 refactoring and bug fixing beam search generate 2020-03-05 13:12:50 +01:00
4c91a3af94 Document keras_serializable decorator 2020-03-05 11:48:10 +00:00
4be01e5cbf Use name transformers_config in Keras serialization
Be explicit that this is config for the transformers package (as these
layers may coexist with other custom stuff in a Keras model, plus the
Keras container itself is called config, and config["config"] is not
great)

Add explicit error handling for initializer calls that have neither
the `config` nor the `transformers_config` argument, or have both.
2020-03-05 11:47:35 +00:00
a355f4f0fc Add functools.wraps for wrapper initializer
Preserve the original initializer function's metadata. See
https://docs.python.org/3/library/functools.html#functools.update_wrapper
2020-03-05 11:18:50 +00:00
d262a5d48e fix: remove unused import 2020-03-05 11:05:29 +00:00
30624f7056 Fix Colab links + install dependencies first.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 11:40:15 +01:00
3f067f4409 [hf_api] slightly more doc 2020-03-04 23:55:46 -05:00
f564f93c84 [hf_api] Get the public list of all the models on huggingface 2020-03-04 23:33:09 -05:00
ff9e79ba3a make style 2020-03-04 20:18:07 -05:00
07a79db505 Fix failing doc samples 2020-03-04 19:11:31 -05:00
4f338ed407 Explicit config_class instead of module inspection 2020-03-04 23:45:29 +00:00
6fe1cc0874 fix: clean up inadvertent change in tf_t5
This was the beginnings of an attempt to address the test failure on
this layer, and instead I backed out of making this layer
keras-serializable at all ... so it was a mistake to commit this.
2020-03-04 23:24:15 +00:00
bdd3d0c76d Merge pull request #3118 from patrickvonplaten/add_beam_search_to_generation_tf_2_0
Add beam search to generation tf 2 0
2020-03-04 23:28:00 +01:00
c440030e99 [model_cards] Tag AR model languages 2020-03-04 16:33:10 -05:00
3b7f95a506 Merge pull request #3115 from gthb/fix-bogus-param-to-layer-init
fix: passing config as Layer trainable param
2020-03-04 21:59:09 +01:00
1bca97ec7f Update notebook link and fix few working issues.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-04 21:19:33 +01:00
189113d891 Create README.md 2020-03-04 13:57:23 -05:00
76111a3d3a [model_cards] Add card by @lvwerra
(the current way to submit a model card to have it displayed on the website is to open a PR on the `transformers` repo itself)

Thanks for sharing!
2020-03-04 12:55:20 -05:00
a43c388abb [model_cards] Add card by @djstrong
(the current way to submit a model card to have it displayed on the website is to open a PR on the `transformers` repo itself)

Thanks for sharing!
2020-03-04 12:53:02 -05:00
ec60e0ae7a Create README.md 2020-03-04 12:06:05 -05:00
6a143bf282 model cards for both aubmindlab/bert-base-arabert models (#3113)
* Added readme for AraBERTv0.1

* Added readme to AraBERT
2020-03-04 12:04:39 -05:00
932eab943d include tf gpt2 tests for attn mask and past variable (#3122) 2020-03-04 12:03:46 -05:00
256cbbc4a2 [doc] Fix link to how-to-train Colab 2020-03-04 12:01:45 -05:00
006097f8ad rename variables named 'word' to 'token' in generate fn (#3119)
* fix conflits

* fixed naming bug

* make style
2020-03-04 12:01:17 -05:00
18f4b9274f fix: work with Tensorflow < 2.1.0
tf.keras.utils.register_keras_serializable was added in TF 2.1.0, so
don't rely on it being there; just decorate the class with it if it
exists.
2020-03-04 16:57:29 +00:00
71c8711970 Adding Docker images for transformers + notebooks (#3051)
* Added transformers-pytorch-cpu and gpu Docker images

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added automatic jupyter launch for Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move image from alpine to Ubuntu to align with NVidia container images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added TRANSFORMERS_VERSION argument to Dockerfile.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Pytorch-GPU based Docker image

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Tensorflow images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use python 3.7 as Tensorflow doesnt provide 3.8 compatible wheel.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove double FROM instructions on transformers-pytorch-cpu image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-tensorflow-gpu Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* use the correct ubuntu version for tensorflow-gpu

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pipelines example notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-cpu and transformers-gpu (including both PyTorch and TensorFlow) images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Docker images doesnt start jupyter notebook by default.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Tokenizers notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update images links

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update Docker images to python 3.7.6 and transformers 2.5.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added 02-transformers notebook.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trying to realign 02-transformers notebook ?

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Transformer image schema

* Some tweaks on tokenizers notebook

* Removed old notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Attempt to provide table of content for each notebooks

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Second attempt.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reintroduce transformer image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Keep trying

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* It's going to fly !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remaining of the Table of Content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix inlined elements for the table of content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed anaconda dependencies for Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removing notebooks ToC

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added LABEL to each docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed old Dockerfile

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Directly use the context and include transformers from here.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce overall size of compiled Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install jupyter by default and use CMD for easier launching of the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce number of layers in the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added README.md for notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix notebooks link in README

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some wording issues.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added blog notebooks too.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing spelling errors in review comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
2020-03-04 11:45:57 -05:00
7a89a3e493 correct beam search sampling 2020-03-04 17:27:47 +01:00
c4c4c9998a make GPT2 and CTRL shape consistent between torch and TF 2020-03-04 17:27:47 +01:00
2529b2d37e set redorder past sort dimension to its default 2020-03-04 17:27:47 +01:00
61fef6e957 added beam_search generation for tf 2.0 2020-03-04 17:27:47 +01:00
34de670dbe fix sklearn release circle ci [temporary] (#3123) 2020-03-04 11:25:23 -05:00
6701fb7859 fix beam_search behavior when sampling (#3106)
* fix beam_search behavior when sampling

* delete print

* make correct style
2020-03-04 09:30:51 -05:00
b1116fd673 fix: passing config as Layer trainable param
Lurking bugs discovered while working on other stuff.
2020-03-03 23:05:40 +00:00
96c4990165 fix unused imports and style 2020-03-03 22:57:05 +00:00
470753bcf5 Put @keras_serializable only on layers it works on
And only run the test on TF*MainLayer classes so marked.
2020-03-03 22:44:45 +00:00
0c716ede8c Use class decorator instead of superclass
When supplied by Keras deserialization, the config parameter to initializers
will be a dict. So intercept it and convert to PretrainedConfig object (and
store in instance attribute for get_config to get at it) before passing to the
actual initializer. To accomplish this, and repeat as little code as possible,
use a class decorator on TF*MainLayer classes.
2020-03-03 22:31:42 +00:00
e9e6efdc45 BartForSequenceClassification: fix num_labels, add test (#3110) 2020-03-03 15:54:29 -05:00
f631e01d2c [ci] Re-run integration ground truth from fairseq
Adopted best practice set by @patrickvonplaten of commenting lines run on fairseq, for easy comparison

also see #3020
2020-03-03 15:31:40 -05:00
5b396457e5 Summarization Examples: add Bart CNN Evaluation (#3082)
* Rename and improve example

* Add test

* slightly faster test

* style

* This breaks remy prolly

* shorter test string

* no slow

* newdir structure

* New tree

* Style

* shorter

* docs

* clean

* Attempt future import

* more import hax
2020-03-03 15:29:59 -05:00
5c5af879b6 [Bart] dont call .forward (#3094) 2020-03-03 15:14:12 -05:00
b8da16f390 Add (failing) tests for Keras save/load 2020-03-03 15:22:34 +00:00
ba28170717 Support keras JSON/HDF5 serialization of main layers
Fixes #3101
2020-03-03 15:21:41 +00:00
a088d75e51 [model_cards] Fix incorrect path 2020-03-03 09:52:32 -05:00
4134100363 Add generate() functionality to TF 2.0 (#3063)
* add first copy past test to tf 2 generate

* add tf top_k_top_p_filter fn

* add generate function for TF

* add generate function for TF

* implemented generate for all models expect transfoXL

* implemented generate for all models expect transfoXL

* implemented generate for all models expect transfoXL

* make style

* change permission of test file to correct ones

* delete ipdb

* delete ipdb

* fix bug and finish simple gpt2 integration test

* clean test file

* clean test file

* make style

* make style

* make style

* make style

* change import style

* change import style

* make style

* make style

* add decorators

* add decorators

* fix tf ctrl bug dim => axis in TF

* make style

* make style

* refactored test file

* refactored test file

* take out test_torch_tf_conversion if nothing is defined

* take out test_torch_tf_conversion if nothing is defined

* remove useless files

* remove useless files

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* solve conflicts

* solve conflicts

* fix conflicts

* fix conflicts

* merge conflicts

* delete ipdb

* exposed top_k_top_p_filtering fns

* delete weirdly created w! file

* add comment to test tf common modeling

* fix conflicts

* fix conflicts

* make style

* merge conflicts

* make style

* change tf.tensor.shape to shape_list(tensor)
2020-03-03 09:42:15 -05:00
b31f715019 bert-base-arabic model card 2020-03-03 09:29:28 -05:00
c0c7ec3458 Don't crash if fine-tuned model doesn't end with a number (#3099)
That's the same fix applied in https://github.com/huggingface/transformers/issues/2258 , but for the GLUE example
2020-03-03 08:59:47 -05:00
eec5ec8071 [BART] to each its own config + make BART compatible w/ Pipelines
cc @sshleifer
2020-03-02 18:56:17 -05:00
6b1558bad8 add models cards for camembert-base-fquad camembert-base-squad (#3089)
* add models cards for camembert-base-fquad camembert-base-squad

* typo fix
2020-03-02 17:07:13 -05:00
f169957d0c TF GPU CI (#3085)
* debug env

* Restrict TF GPU memory

* Fixup

* One more test

* rm debug logs

* Fixup
2020-03-02 15:45:25 -05:00
d3eb7d23a4 Pipeline doc (#3055)
* Pipeline doc initial commit

* pipeline abstraction

* Remove modelcard argument from pipeline

* Task-specific pipelines can be instantiated with no model or tokenizer

* All pipelines doc
2020-03-02 14:07:10 -05:00
2c7749784c Update README.md
- Add example of usage
- Update metrics
2020-03-02 13:35:34 -05:00
0e56b37e80 rm bogus file
cc @patrickvonplaten
2020-03-02 12:27:12 -05:00
2fdc7f6ce8 correct greedy generation when doing beam search (#3078)
* correct greedy generation when doing beam search

* improve comment
2020-03-02 12:00:09 -05:00
13afb71208 [ci] Ensure that TF does not preempt all GPU memory for itself
see https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

Co-Authored-By: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-03-02 11:56:45 -05:00
c0135194eb Force pad_token_id to be set before padding for standard tokenizer (#3035)
* force pad_token_id to be set before padding

* fix tests and forbid padding without having a padding_token_id set
2020-03-02 10:53:55 -05:00
b54ef78d0c Bart-CNN (#3059)
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
2020-03-02 10:35:53 -05:00
6b1ff25084 fix n_gpu count when no_cuda flag is activated (#3077)
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
2020-03-02 10:20:21 -05:00
298bed16a8 make style 2020-03-01 14:08:01 -05:00
852e032ca6 include roberta in run_squad_w_distillation - cc @graviraja 2020-03-01 01:56:50 +00:00
b5509abb36 --do_lower_case will always trick me... 2020-03-01 01:39:24 +00:00
d6ef587a10 [ci] Fixup e36bd94345af6045108a391f9ac7f4dc557548de 2020-02-28 23:19:17 -05:00
e36bd94345 [ci] Run all tests on (self-hosted) GPU (#3020)
* Create self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* do not run slow tests, for now

* [ci] For comparison with circleci, let's also run CPU-tests

* [ci] reorganize

* clearer filenames

* [ci] Final tweaks before merging

* rm slow tests on circle ci

* Trigger CI

* On GPU this concurrency was way too high
2020-02-28 21:11:08 -05:00
908fa43b54 Changes to NER examples for PLT and TPU (#3053)
* changes to allow for tpu training

* black

* tpu

* tpu
2020-02-27 16:45:32 -05:00
8bcb37bfb8 NER support for Albert in run_ner.py and NerPipeline (#2983)
* * Added support for Albert when fine-tuning for NER

* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens

* Added ,

* Now passes style guide enforcement

* Changes from reviews.

* Code now passes style enforcement

* Added test for AlbertForTokenClassification

* Added test for AlbertForTokenClassification
2020-02-27 10:22:55 -05:00
6a37588041 spelling: strictly (#3042) 2020-02-27 10:22:35 -05:00
f4ff44a6d9 Fix batch_encode_plus (#3041) 2020-02-27 09:56:47 -05:00
f71157529e Added test for AlbertForTokenClassification 2020-02-27 12:24:20 +01:00
aceb6a0907 Added test for AlbertForTokenClassification 2020-02-27 11:52:46 +01:00
d762d4289c Code now passes style enforcement 2020-02-26 23:50:40 +01:00
9495d38b0d Changes from reviews. 2020-02-26 23:36:39 +01:00
b370cc7e99 [gpu] Fixup fdd61b19928e87a5354c36923182e801bfedb31b 2020-02-26 21:48:49 +00:00
f5516805c2 Fix bart slow test 2020-02-26 20:47:49 +00:00
5bc99e7f33 fix several typos in Distil* readme (#3034) 2020-02-26 12:39:54 -05:00
fdd61b1992 Fix attn mask gpt2 when using past (#3033)
* fix issue and add some tests

* fix issue and add some tests

* updated doc string gpt2
2020-02-26 12:04:37 -05:00
9cda3620b6 Fix (non-slow) tests on GPU (torch) (#3024)
* Fix tests on GPU (torch)

* Fix bart slow tests

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-02-26 11:59:25 -05:00
9df74b8bc4 Delete all mentions of Model2Model (#3019) 2020-02-26 11:36:27 -05:00
bb7c468520 Documentation (#2989)
* All Tokenizers

BertTokenizer + few fixes
RobertaTokenizer
OpenAIGPTTokenizer + Fixes
GPT2Tokenizer + fixes
TransfoXLTokenizer
Correct rst for TransformerXL
XLMTokenizer + fixes
XLNet Tokenizer + Style
DistilBERT + Fix XLNet RST
CTRLTokenizer
CamemBERT Tokenizer
FlaubertTokenizer
XLMRobertaTokenizer
cleanup

* cleanup
2020-02-25 18:43:36 -05:00
c913eb9c38 Add integration tests for xlm roberta modelling and xlm roberta tokenzier (#3014)
* add first files

* add xlm roberta integration tests

* make style

* flake 8 issues solved
2020-02-25 16:51:25 -05:00
e8ce63ff21 Change masking to direct labeling for TPU support. (#2982)
* change masking to direct labelings

* fix black

* switch to ignore index

* .

* fix black
2020-02-25 14:47:43 -05:00
7a7ee28cb9 missing ner link (#2967) 2020-02-25 14:06:57 -05:00
65e7c90a77 Adding usage examples for common tasks (#2850)
* Usage: Sequence Classification & Question Answering

* Pipeline example

* Language modeling

* TensorFlow code for Sequence classification

* Custom TF/PT toggler in docs

* QA + LM for TensorFlow

* Finish Usage for both PyTorch and TensorFlow

* Addressing Julien's comments

* More assertive

* cleanup

* Favicon
- added favicon option in conf.py along with the favicon image
- udpated 🤗 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)

Co-authored-by: joshchagani <joshua@joshuachagani.com>
2020-02-25 13:48:24 -05:00
f5b50c6b8e make style 2020-02-25 16:41:54 +01:00
ec16142ee5 add special tokens to pretrain configs of respective lm head models 2020-02-25 16:37:59 +01:00
e645dcbb70 add special tokens to pretrain configs of respective lm head models 2020-02-25 16:37:56 +01:00
e693cd1e87 [ci] Run slow tests every day 2020-02-24 19:54:47 -05:00
4fc63151af [ci] Attempt to fix #2844 2020-02-24 19:51:34 -05:00
b90745c590 Test correct tokenizers after default switch (#3003) 2020-02-24 18:45:53 -05:00
3716c3d8af False by default (#3002) 2020-02-24 18:30:57 -05:00
f9ec5ca90b Release: v2.5.1 2020-02-24 18:22:54 -05:00
4cd9c0971c Fix for fast tokenizers save_pretrained compatibility with Python. (#2933)
* Renamed file generate by tokenizers when calling save_pretrained to match python.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added save_vocabulary tests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove python quick and dirty fix for clean Rust impl.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.5.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added some save_pretrained / from_pretrained unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update tokenizers to 0.5.2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality and format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* flake8

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Making sure there is really a bug in unittest

* Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-24 18:20:42 -05:00
ee60840ee6 fix _update_memory fn call in transformer-xl (#2971) 2020-02-24 17:50:24 -05:00
6a50d501ec add explaining example to XLNet LM modeling (#2997)
* add explaining example to XLNet LM modeling

* improve docstring for xlnet
2020-02-24 15:42:38 -05:00
65d74c4965 Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987)
* add preprocessing to add space before punctuation for transfo_xl

* improve warning messages

* make style

* compile regex at instantination of tokenizer object
2020-02-24 15:11:10 -05:00
a143d9479e Add local_files_only parameter to pretrained items (#2930)
* Add disable_outgoing to pretrained items

Setting disable_outgoing=True disables outgonig traffic:
- etags are not looked up
- models are not downloaded

* parameter name change

* Remove forgotten print
2020-02-24 14:58:15 -05:00
286d1ec746 Create README.md 2020-02-24 14:33:49 -05:00
7984a70ee4 kwargs are passed to both model and configuration in AutoModels (#2998) 2020-02-24 14:19:39 -05:00
21d8b6a33e Testing that batch_encode_plus is the same as encode_plus (#2973)
* Testing that encode_plus and batch_encode_plus behave the same way

Spoiler alert: they don't

* Testing rest of arguments in batch_encode_plus

* Test tensor return in batch_encode_plus

* Addressing Sam's comments

* flake8

* Simplified with `num_added_tokens`
2020-02-24 12:09:46 -05:00
17c45c39ed Add slow generate tests for pretrained lm models (#2909)
* add slow generate lm_model tests

* fix conflicts

* merge conflicts

* fix conflicts

* add slow generate lm_model tests

* make style

* delete unused variable

* fix conflicts

* fix conflicts

* fix conflicts

* delete unused variable

* fix conflicts

* finished hard coded tests
2020-02-24 11:51:57 -05:00
8194df8e0c Warning on add_special_tokens (#2966)
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
2020-02-24 08:42:54 -05:00
38f5fe9e02 add_ctags_to_git_ignore (#2984) 2020-02-23 16:55:32 -05:00
105dcb4162 Now passes style guide enforcement 2020-02-23 21:47:59 +01:00
33eb8a165d Added , 2020-02-23 21:43:31 +01:00
869b66f6b3 * Added support for Albert when fine-tuning for NER
* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
2020-02-23 21:13:03 +01:00
129f0604ac Delete untested, broken Model2LSTM (#2968) 2020-02-23 11:28:48 -05:00
0e84559d64 Correct special_tokens_mask when add_special_tokens=False (#2965)
Don't know of a use case where that would be useful, but this is more consistent
2020-02-23 09:50:39 -05:00
92487a1dc0 Bart: fix layerdrop and cached decoder_input_ids for generation (#2969) 2020-02-22 16:25:04 -05:00
c36416e53c Add standardized get_vocab method to tokenizers 2020-02-22 12:09:01 -05:00
cafc4dfc7c fix hardcoded path in examples readme 2020-02-22 11:12:38 -05:00
34b4b5a9ed Update modelcard of bert-base-german-cased
Add image
2020-02-22 11:08:42 -05:00
7df12d7bf8 Update README.md
- I added an example using the model with pipelines to show that we have set```{"use_fast": False}``` in the tokenizer.
- I added a Colab to play with the model and pipelines
- I added a Colab to discover Huggingface pipelines at the end of the document
2020-02-22 11:06:41 -05:00
cc6775cdf5 Fix max_length not taken into account when using pad_to_max_length on fast tokenizers (#2961)
* enable_padding should pad up to max_length if set.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added more testing on padding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-22 09:27:47 -05:00
94ff2d6ee8 Remove double bias (#2958) 2020-02-21 17:10:18 -05:00
b5b3445c4f Only use F.gelu for torch >=1.4.0 (#2955)
* Only use F.gelu for torch >=1.4.0

* Use F.gelu for newer torch
2020-02-21 16:10:21 -05:00
fc38d4c86f Improve special_token_id logic in run_generation.py and add tests (#2885)
* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* changed fast random lm generation testing design to more general one

* delete in old testing design in gpt2

* correct old variable name

* temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed

* adapted all fast random generate tests to new design

* better warning description in modeling_utils

* better comment

* better comment and error message

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-02-21 12:09:59 -05:00
c749a543fa Added CamembertForQuestionAnswering (#2746)
* Added CamembertForQuestionAnswering

* fixed camembert tokenizer case
2020-02-21 12:01:02 -05:00
5211d333bb Update modeling_tf_utils.py (#2924)
Tensorflow does not use .eval() vs .train().

closes https://github.com/huggingface/transformers/issues/2906
2020-02-21 11:28:32 -05:00
3e98f27e4a Create README.md for xlnet_large_squad (#2942) 2020-02-21 08:54:41 -05:00
4452b44b90 Labels are now added to model config under id2label and label2id (#2945) 2020-02-21 08:53:05 -05:00
53ce3854a1 New BartModel (#2745)
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00
564fd75d65 Removed unused fields in DistilBert TransformerBlock (#2710)
* Removed unused fields in DistilBert TransformerBlock
2020-02-20 16:08:21 -05:00
889d3bfdbb default arg fix (#2937) 2020-02-20 15:31:17 -05:00
197d74f988 Add get_vocab method to PretrainedTokenizer 2020-02-20 15:26:49 -05:00
ea8eba35e2 Fix InputExample docstring (#2891) 2020-02-20 15:25:15 -05:00
e2a6445ebb Tokenizer fast warnings (#2922)
* Remove warning when pad_to_max_length is not set.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move RoberTa warning to RoberTa and not GPT2 base tokenizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 11:55:03 -05:00
9b3093311f Expose all constructor parameter for BertTokenizerFast (#2921)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 11:53:32 -05:00
b662f0e625 Support for torch-lightning in NER examples (#2890)
* initial pytorch lightning commit

* tested multigpu

* Fix learning rate schedule

* black formatting

* fix flake8

* isort

* isort

* .

Co-authored-by: Check your git settings! <chris@chris-laptop>
2020-02-20 11:50:05 -05:00
ab1238393c Update to include example of LM
The model files have been updated in order to include the classification layers, based on https://github.com/huggingface/transformers/issues/2901, and now can be also used as a LM.
2020-02-20 10:57:59 -05:00
976e9afece Add syntax highlighting to the BibTeX in README 2020-02-20 10:06:15 -05:00
cbc5705541 Fix spell: EsperBERTo, not EspertBERTo 2020-02-20 10:02:07 -05:00
d490b5d500 Fast Tokenizers save pretrained should return the list of generated file paths. (#2918)
* Correctly return the tuple of generated file(s) when calling save_pretrained

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality and format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 00:58:04 +01:00
2708b44ee9 Patch ALBERT with heads in TensorFlow 2020-02-19 18:46:25 -05:00
1abd53b1aa Patch ALBERT with heads in TensorFlow 2020-02-19 18:24:40 -05:00
e676764241 Override build_inputs_with_special_tokens for fast tokenizers (#2912)
* Override build_inputs_with_special_tokens for fast impl + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality + format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-19 16:09:51 -05:00
59c23ad9c9 README link + better instructions for release 2020-02-19 11:57:17 -05:00
22b2b5790e Documentation v2.5.0 2020-02-19 11:53:30 -05:00
fb560dcb07 Release: v2.5.0
Welcome Rust Tokenizers
2020-02-19 11:46:19 -05:00
3f3fa7f7da Integrate fast tokenizers library inside transformers (#2674)
* Implemented fast version of tokenizers

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bumped tokenizers version requirements to latest 0.2.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added matching tests

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching OpenAI GPT tokenization !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching GPT2 on tokenizers

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Expose add_prefix_space as constructor parameter for GPT2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching Roberta tokenization !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed fast implementation of CTRL.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Binding TransformerXL tokenizers to Rust.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Updating tests accordingly.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added tokenizers as top-level modules.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Black & isort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Rename LookupTable to WordLevel to match Rust side.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Black.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use "fast" suffix instead of "ru" for rust tokenizers implementations.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Introduce tokenize() method on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* encode_plus dispatchs to batch_encode_plus

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* batch_encode_plus now dispatchs to encode if there is only one input element.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bind all the encode_plus parameter to the forwarded batch_encode_plus call.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.3.0

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Formatting.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix tokenization_auto with support for new (python, fast) mapping schema.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Give correct fixtures path in test_tokenization_fast.py for the CLI.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Expose max_len_ properties on BertTokenizerFast

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move max_len_ properties to PreTrainedTokenizerFast and override in specific subclasses.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* _convert_encoding should keep the batch axis tensor if only one sample in the batch.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning message for RobertaTokenizerFast if used for MLM.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added use_fast (bool) parameter on AutoTokenizer.from_pretrained().

This allows to easily enable/disable Rust-based tokenizer instantiation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's tokenizers handle all the truncation and padding stuff.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Allow to provide tokenizer arguments during pipeline creation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update test_fill_mask pipeline to not use fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix too much parameters for convert_encoding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* When enabling padding, max_length should be set to None.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Avoid returning nested tensors of length 1 when calling encode_plus

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure output is padded when return_tensor is not None.

Tensor creation requires the inital list input to be of the exact same size.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Disable transfoxl unittest if pytorch is not available (required to load the model)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* encode_plus should not remove the leading batch axis if return_tensor is set

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Temporary disable fast tokenizers on QA pipelines.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix formatting issues.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update tokenizers to 0.4.0

* Update style

* Enable truncation + stride unit test on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add unittest ensuring special_tokens set match between Python and Rust.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure special_tokens are correctly set during construction.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Give more warning feedback to the user in case of padding without pad_token.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* quality & format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added possibility to add a single token as str

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added unittest for add_tokens and add_special_tokens on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix rebase mismatch on pipelines qa default model.

QA requires cased input while the tokenizers would be uncased.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Using offset mapping relative to the original string + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: save_vocabulary requires folder and file name

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Simplify import for Bert.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: truncate_and_pad disables padding according to the same heuristic than the one enabling padding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Remove private member access in tokenize()

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Bump tokenizers dependency to 0.4.2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* format & quality.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Use named arguments when applicable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Add Github link to Roberta/GPT2 space issue on masked input.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Move max_len_single_sentence / max_len_sentences_pair to PreTrainedTokenizerFast + tests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Relax type checking to include tuple and list object.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Document the truncate_and_pad manager behavior.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Raise an exception if return_offsets_mapping is not available with the current tokenizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure padding is set on the tokenizers before setting any padding strategy + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* On pytorch we need to stack tensor to get proper new axis.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Generalize tests to different framework removing hard written return_tensors="..."

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizer dependency for num_special_tokens_to_add

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Overflowing tokens in batch_encode_plus are now stacked over the batch axis.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improved error message for padding strategy without pad token.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bumping tokenizers dependency to 0.5.0 for release.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Optimizing convert_encoding around 4x improvement. 🚀

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* expose pad_to_max_length in encode_plus to avoid duplicating the parameters in kwargs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Generate a proper overflow_to_sampling_mapping when return_overflowing_tokens is True.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix unittests for overflow_to_sampling_mapping not being returned as tensor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Format & quality.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove perfect alignment constraint for Roberta (allowing 1% difference max)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Triggering final CI

Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
2020-02-19 11:35:40 -05:00
ffb93ec0cc Create README.md 2020-02-19 10:51:16 -05:00
20fc18fbda Skip flaky test_tf_question_answering (#2845)
* Skip flaky test

* Style
2020-02-18 16:14:50 -05:00
2ae98336d1 fix vocab size in binarized_data (distil): int16 vs int32 2020-02-18 16:17:35 +00:00
0dbddba6d2 fix typo in hans example call 2020-02-17 20:19:57 +00:00
29ab4b7f40 Create README.md 2020-02-17 10:58:43 -05:00
c88ed74ccf [model_cards] 🇹🇷 Add new (cased) BERTurk model 2020-02-17 09:54:46 -05:00
5b2d4f2657 Merge pull request #2881 from patrickvonplaten/add_vim_swp_to_gitignore
update .gitignore to ignore .swp files created when using vim
2020-02-17 14:36:49 +01:00
fb4d8d0832 update .gitignore to ignore .swp files created when using vim 2020-02-17 14:26:32 +01:00
6083c1566e Update README.md
I trained the model for more epochs so I improved the results. This commit will update the results of the model and add a gif using it with **transformers/pipelines**
2020-02-16 10:09:34 -05:00
73028c5df0 [model_cards] EsperBERTo 2020-02-14 15:16:33 -05:00
81fb8d3251 Update model card: new performance chart (#2864)
* Update model performance for correct German conll03 dataset

* Adjust text

* Adjust line spacing
2020-02-14 13:39:23 -05:00
4e69104a1f [model_cards] Also use the thumbnail as meta
Co-Authored-By: Ilias Chalkidis <ihalk@di.uoa.gr>
2020-02-14 10:27:11 -05:00
73d79d42b4 [model_cards] nlptown/bert-base-multilingual-uncased-sentiment
cc @yvespeirsman

Co-Authored-By: Yves Peirsman <yvespeirsman@users.noreply.github.com>
2020-02-14 09:51:11 -05:00
47b735f994 Added model card for bert-base-multilingual-uncased-sentiment (#2859)
* Created model card for nlptown/bert-base-multilingual-sentiment

* Delete model card

* Created model card for bert-base-multilingual-uncased-sentiment as README
2020-02-14 09:31:15 -05:00
7d22fefd37 [pipeline] Alias NerPipeline as TokenClassificationPipeline 2020-02-14 09:18:10 -05:00
61a2b7dc9d Fix typo 2020-02-14 09:13:07 -05:00
6e261d3a22 Fix typos 2020-02-14 09:11:07 -05:00
4e597c8e4d Fix typo 2020-02-14 09:07:42 -05:00
925a13ced1 [model_cards] mv README.md 2020-02-13 23:07:29 -05:00
575a3b7aa1 Create distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-13 23:04:52 -05:00
4d36472b96 [run_ner] Don't crash if fine-tuning local model that doesn't end with digit 2020-02-14 03:25:29 +00:00
8514018300 Update with additional information
Added a "Pre-training details" section
2020-02-13 21:54:42 -05:00
1eec69a900 Create README.md 2020-02-13 19:27:22 -05:00
8744402f1e add model_card flaubert-base-uncased-squad (#2833)
* add model_card

* Add tag

cc @fmikaelian

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 17:19:13 -05:00
7f98edd7e3 Model card: Literary German BERT (#2843)
* feat: create model card

* chore: add description

* feat: stats plot

* Delete prosa-jahre.svg

* feat: years plot (again)

* chore: add more details

* fix: typos

* feat: kfold plot

* feat: kfold plot

* Rename model_cards/severinsimmler/literary-german-bert.md to model_cards/severinsimmler/literary-german-bert/README.md

* Support for linked images + add tags

cc @severinsimmler

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 15:43:44 -05:00
f1e8a51f08 Preserve spaces in GPT-2 tokenizers (#2778)
* Preserve spaces in GPT-2 tokenizers

Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa)
tokenizers, enabling correct BPE encoding. Automatically inserts a space
in front of first token in encode function when adding special tokens.

* Add tokenization preprocessing method

* Add framework argument to pipeline factory

Also fixes pipeline test issue. Each test input now treated as a
distinct sequence.
2020-02-13 13:29:43 -05:00
0ed630f139 Attempt to increase timeout for circleci slow tests (#2844) 2020-02-13 09:11:03 -05:00
ef74b0f07a get_activation('relu') provides a simple mapping from strings i… (#2807)
* activations.py contains a mapping from string to activation function
* resolves some `gelu` vs `gelu_new` ambiguity
2020-02-13 08:28:33 -05:00
f54a5bd37f Raise error when using an mlm flag for a clm model + correct TextDataset 2020-02-12 13:23:14 -05:00
569897ce2c Fix a few issues regarding the language modeling script 2020-02-12 13:23:14 -05:00
21da895013 [model_cards] Better image for social sharing 2020-02-11 20:30:08 -05:00
9a70910d47 [model_cards] Tweak @mrm8488's model card 2020-02-11 20:20:39 -05:00
9274734a0d [model_cards] mv to correct location + tweak tag 2020-02-11 20:13:57 -05:00
69f948461f Create bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-11 20:07:15 -05:00
e0b6247cf7 [model_cards] Change formatting slightly as we updated our markdown engine
cc @tholor @loretoparisi @simonefrancia
2020-02-11 18:25:21 -05:00
5f2dd71d1b Smaller diff 2020-02-11 17:20:09 -05:00
31158af57c formatting 2020-02-11 17:20:09 -05:00
5dd61fb9a9 Add more specific testing advice to Contributing.md 2020-02-11 17:20:09 -05:00
ee5de0ba44 BERT decoder: Fix causal mask dtype.
PyTorch < 1.3 requires multiplication operands to be of the same type.
This was violated when using default attention mask (i.e.,
attention_mask=None in arguments) given BERT in the decoder mode.

In particular, this was breaking Model2Model and made tutorial
from the quickstart failing.
2020-02-11 15:19:22 -05:00
bed38d3afe Fix typo in src/transformers/data/processors/squad.py 2020-02-11 11:22:24 -05:00
498d06e914 [model_cards] Add new German Europeana BERT models (#2805)
* [model_cards] New German Europeana BERT models from dbmdz

* [model_cards] Update German Europeana BERT models from dbmdz
2020-02-11 10:49:39 -05:00
3e3a9e2c01 Merge pull request #2793 from huggingface/tensorflow-210-circleci-fix
Fix circleci cuInit error on Tensorflow >= 2.1.0.
2020-02-11 10:48:42 +00:00
1f5db9a13c [model_cards] Rm extraneous tag 2020-02-10 17:45:13 -05:00
95bac8dabb [model_cards] Add language metadata to existing model cards
This will enable filtering on language (amongst other tags) on the website

cc @loretoparisi, @stefan-it, @HenrykBorzymowski, @marma
2020-02-10 17:42:42 -05:00
ba498eac38 Create README.md (#2785)
* Create README.md

* Update README.md

* Update README.md

* Update README.md

* [model_cards] Use code fences for consistency

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-10 17:27:59 -05:00
68ccc04ee6 Add model readme for deepset/roberta-base-squad2 (#2797)
* Add readme for deepset/roberta-base-squad2

* update model readme
2020-02-10 15:21:48 -05:00
539f601be7 intermediate_size > hidden_dim in distilbert config docstrings 2020-02-10 13:45:57 -05:00
cfb7d108bd FlauBERT lang embeddings only when n_langs > 1 2020-02-10 13:24:04 -05:00
b4691a438d [model_cards] BERT-of-Theseus: use the visual as thumbnail
cc @jetrunner

Co-Authored-By: Kevin Canwen Xu <canwenxu@outlook.com>
2020-02-10 11:27:08 -05:00
fc325e97cd [model_cards] Showcase model tag syntax 2020-02-10 11:27:08 -05:00
fd639e5be3 Correct quickstart example when using the past 2020-02-10 11:25:56 -05:00
63a5399bc4 [model_cards] Specify language meta + thumbnail
cc @tholor

see #2799
2020-02-10 11:20:05 -05:00
125a75a121 Correctly compute tokens when padding on the left 2020-02-10 10:47:42 -05:00
9c64d1da35 Add model readme for bert-base-german-cased (#2799)
* add readme for bert-base-german-cased

* update readme
2020-02-10 10:27:29 -05:00
bf99014c46 Create BERT-of-Theseus model card 2020-02-10 09:58:40 -05:00
92e974196f Merge pull request #2765 from huggingface/extract-cached-archives
Add option to `cached_path` to automatically extract archives
2020-02-10 14:05:16 +01:00
6aa7973aec Fix circleci cuInit error on Tensorflow >= 2.1.0.
Tensorflow 2.1.0 introduce a new dependency model where pip install tensorflow would install tf with GPU support.
Before it would just install with CPU support, thus CircleCI is looking for NVidia driver version at initialization of the
tensorflow related tests but fails as their is no NVidia Driver running.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-10 13:24:37 +01:00
520e7f2119 Correct docstring for xlnet 2020-02-07 16:42:35 -05:00
dd28830327 Update RoBERTa tips 2020-02-07 16:42:35 -05:00
db97930122 Update XLM-R tips 2020-02-07 16:42:35 -05:00
7046de2991 E231 2020-02-07 15:28:13 -05:00
0d3aa3c04c styling 2020-02-07 15:28:13 -05:00
d8b43600fd omission 2020-02-07 15:28:13 -05:00
ee5a6856ca distilbert-base-cased weights + Readmes + omissions 2020-02-07 15:28:13 -05:00
73368963b2 Fix importing unofficial TF models with extra optimizer weights 2020-02-07 10:25:31 -05:00
Ari
d7dabfeff5 Fix documentation in ProjectedAdaptiveLogSoftmax 2020-02-07 10:14:58 -05:00
42f08e596f [examples] rename run_lm_finetuning to run_language_modeling 2020-02-07 09:15:28 -05:00
4f7bdb0958 [examples] Fix broken markdown 2020-02-07 09:15:28 -05:00
c6c5c3fd4e style and quality 2020-02-07 08:58:06 +01:00
961c69776f @julien-c proposal for TF/PT compat in hf_buckets 2020-02-07 08:53:17 +01:00
d311f87bca cleanup 2020-02-07 00:05:28 +01:00
7d99e05f76 file_cache has options to extract archives 2020-02-07 00:03:12 +01:00
2c12464a20 Changed vocabulary save function. Variable name was inconsistent, causing an error to be thrown when passing a file name instead of a directory. 2020-02-06 16:40:07 -05:00
6fc3d34abd Fix multi-gpu evaluation in run_glue.py 2020-02-06 16:38:55 -05:00
7748cbbe7d Oopsie 2020-02-06 15:30:02 -05:00
432c12521e [docs] Add menu w/ links to other pages on hf.co 2020-02-06 15:30:02 -05:00
c069932f5d Add contributors snapshot
powered by https://github.com/sourcerer-io/hall-of-fame
2020-02-06 15:25:47 -05:00
33d3072e1c Arxiv README (#2747)
* Arxiv README

* ArXiv-NLP readme
2020-02-05 15:26:28 -05:00
eae8ee0389 [doc] model sharing: mention README.md + tweaks
cc @lysandrejik @thomwolf
2020-02-05 14:20:03 -05:00
6bb6a01765 Fix GPT2 config set to trainable
This prevents the model from being saved, and who knows
what else.
2020-02-05 13:55:41 -05:00
ada24def22 [run_lm_finetuning] Tweak fix for non-long tensor, close #2728
see 1ebfeb79469d544a2bd817aa32c77e0514485ff9 and #2728

Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-02-05 12:49:18 -05:00
2184f87003 RoBERTa TensorFlow Tests 2020-02-04 18:05:35 -05:00
e615269cb8 Correct slow test 2020-02-04 18:05:35 -05:00
5f96ebc0be Style 2020-02-04 18:05:35 -05:00
950c6a4f09 Flaubert PyTorch tests 2020-02-04 18:05:35 -05:00
d28b81dc29 RoBERTa Pytorch tests 2020-02-04 18:05:35 -05:00
d1ab1fab1b pass langs parameter to certain XLM models (#2734)
* pass langs parameter to certain XLM models

Adding an argument that specifies the language the SQuAD dataset is in so language-sensitive XLMs (e.g. `xlm-mlm-tlm-xnli15-1024`) don't default to language `0`.
Allows resolution of issue #1799 .

* fixing from `make style`

* fixing style (again)
2020-02-04 17:12:42 -05:00
9e5b549b4d fix default getattr 2020-02-04 16:38:52 -05:00
25848a6094 double quotes 2020-02-04 16:38:52 -05:00
cbcb83f21d minor cleanup of test_attention_outputs 2020-02-04 16:38:52 -05:00
3bf5417258 Revert erroneous fix 2020-02-04 16:31:07 -05:00
1ebfeb7946 Cast to long when masking tokens 2020-02-04 15:56:16 -05:00
9c67196b83 Update quickstart 2020-02-04 11:11:37 -05:00
90ab15cb7a Remove redundant hidden states 2020-02-04 10:59:32 -05:00
9a50828b5c Pipelines: fix crash when modelcard is None
cc @mfuntowicz does this seem correct?
2020-02-03 17:53:39 -05:00
6c1b23554f Sample instead of greedy decoding by default in generate 2020-02-03 17:23:53 -05:00
239dd23f64 [Follow up 213]
Masked indices should have -1 and not -100. Updating documentation + scripts that were forgotten
2020-02-03 16:08:05 -05:00
522c5b5533 Added README.md to Swedish BERT models from National Library of Sweden 2020-02-03 09:09:34 -05:00
9329e59700 Add READMEs to Tensorflow versions of CamemBERT and XLM-RoBERTa 2020-02-03 09:04:34 -05:00
2ba147ecff Fix typo in examples/utils_ner.py
"%s-%d".format() -> "{}-{}".format()
2020-02-01 11:10:57 -05:00
9773e5e0d9 CLI script to gather environment info (#2699)
* add "info" command to CLI

As a convenience, add the info directive to CLI. Running `python transformers-cli info` will return a string containing the transformers version, platform, python version, PT/TF version and GPU support

* Swap f-strings for .format

Still supporting 3.5 so can't use f-strings (sad face)

* Add reference in issue to CLI

* Add the expected fields to issue template

This way, people can still add the information manually if they want. (Though I fear they'll just ignore it.)

* Remove heading from output

* black-ify

* order of imports

Should ensure isort test passes

* use is_X_available over import..pass

* style

* fix copy-paste bug

* Rename command info -> env

Also adds the command to CONTRIBUTING.md in "Did you find a bug" section
2020-02-01 10:38:14 -05:00
ddb6f9476b [model_cards] dbmdz models
Co-Authored-By: Stefan Schweter <stefan-it@users.noreply.github.com>
2020-01-31 18:39:09 -05:00
6636826f04 [model_cards] Multilingual + Dutch SQuAD2.0
Co-Authored-By: HenrykBorzymowski <henrykborzymowski@users.noreply.github.com>
2020-01-31 18:39:09 -05:00
98dadc98e1 [model_cards] UmBERTo
Co-Authored-By: Loreto Parisi <loretoparisi@gmail.com>
Co-Authored-By: Simone Francia <francia.simone1@gmail.com>
2020-01-31 18:39:09 -05:00
d6fc34b459 [model_cards] add mine 2020-01-31 18:39:09 -05:00
d426b58b9e Patch: v2.4.1 2020-01-31 14:55:33 -05:00
1e82cd8457 Flaubert auto tokenizer + tests
cc @julien-c
2020-01-31 14:16:52 -05:00
d18d47be67 run_generation style 2020-01-31 12:05:48 -05:00
ff6f1492e8 FlauBERT load in AutoModel
The FlauBERT configuration file inherits from XLMConfig, and is recognized as such when loading from AutoModels as the XLMConfig is checked before the FlaubertConfig.

Changing the order solves this problem, but a test should be added.
2020-01-31 12:05:15 -05:00
7365f01d43 do_sample should be set to True in run_generation.py 2020-01-31 11:49:32 -05:00
3a21d6da6b Typo on markdown link in README.md 2020-01-31 10:58:49 -05:00
0aa40e9569 v2.4.0 documentation 2020-01-31 09:55:34 -05:00
8036ceb7c5 Update commands for pypi test 2020-01-31 09:48:15 -05:00
6664ea943d Release: v2.4.0 2020-01-31 09:40:32 -05:00
5a6b138b00 [Umberto] model shortcuts (#2661)
* [Umberto] model shortcuts

cc @loretoparisi @simonefrancia

see #2485

* Ensure that tokenizers will be correctly configured
2020-01-30 21:05:53 -05:00
7fe294bf07 Hotfix: same handling of non-existent files as for config 2020-01-30 20:05:04 -05:00
b85c59f997 config.architectures 2020-01-30 19:26:59 -05:00
f9bc3f5771 style tweak 2020-01-30 19:26:59 -05:00
0b13fb822a No need for a model_type here
cc @lysandrejik
2020-01-30 19:26:59 -05:00
71a382319f Correct documentation 2020-01-30 18:41:24 -05:00
01a14ebd8d Add FlauBERT to automodels 2020-01-30 18:40:22 -05:00
9fa836a73f fill_mask helper (#2576)
* fill_mask helper

* [poc] FillMaskPipeline

* Revert "[poc] FillMaskPipeline"

This reverts commit 67eeea55b0f97b46c2b828de0f4ee97d87338335.

* Revert "fill_mask helper"

This reverts commit cacc17b884e14bb6b07989110ffe884ad9e36eaa.

* README: clarify that Pipelines can also do text-classification

cf. question at the AI&ML meetup last week, @mfuntowicz

* Fix test: test feature-extraction pipeline

* Test tweaks

* Slight refactor of existing pipeline (in preparation of new FillMaskPipeline)

* Extraneous doc

* More robust way of doing this

@mfuntowicz as we don't rely on the model name anymore (see AutoConfig)

* Also add RobertaConfig as a quickfix for wrong token_type_ids

* cs

* [BIG] FillMaskPipeline
2020-01-30 18:15:42 -05:00
b43cb09aaa Add layerdrop 2020-01-30 12:05:01 -05:00
df27648bd9 Rename test_examples to test_doc_samples 2020-01-30 10:07:22 -05:00
93dccf527b Pretrained models 2020-01-30 10:04:18 -05:00
90787fed81 Style 2020-01-30 10:04:18 -05:00
73306d028b FlauBERT documentation 2020-01-30 10:04:18 -05:00
ce2f4227ab Fix failing FlauBERT test 2020-01-30 10:04:18 -05:00
f0a4fc6cd6 Add Flaubert 2020-01-30 10:04:18 -05:00
a5381495e6 Added classifier dropout rate in ALBERT 2020-01-30 09:52:34 -05:00
83446a88d9 Use _pad_token of pad_token_id
Requesting pad_token_id would cause an error message when it is None. Use private _pad_token instead.
2020-01-29 17:44:58 -05:00
9fde13a3ac Add check to verify existence of pad_token_id
In batch_encode_plus we have to ensure that the tokenizer has a pad_token_id so that, when padding, no None values are added as padding. That would happen with gpt2, openai, transfoxl.

closes https://github.com/huggingface/transformers/issues/2640
2020-01-29 17:44:58 -05:00
e63a81dd25 Style 2020-01-29 16:29:20 -05:00
217349016a Copy object instead of passing the reference 2020-01-29 16:15:39 -05:00
adb8c93134 Remove lines causing a KeyError 2020-01-29 14:01:16 -05:00
c69b082601 Update documentation 2020-01-29 12:06:13 -05:00
ca1d66734d Apply quality and style requirements once again 2020-01-29 12:06:13 -05:00
5e3c72842d bugfix on model name 2020-01-29 12:06:13 -05:00
0731fa1587 Apply quality and style requirements 2020-01-29 12:06:13 -05:00
a3998e76ae Add TF2 CamemBERT model 2020-01-29 12:06:13 -05:00
b5625f131d Style 2020-01-29 11:47:49 -05:00
44a5b4bbe7 Update documentation 2020-01-29 11:47:49 -05:00
7fc628d98e Apply style 2020-01-29 11:47:49 -05:00
64ca855617 Add TF2 XLM-RoBERTa model 2020-01-29 11:47:49 -05:00
9d87eafd11 Streamlining
- mostly stylistic streamlining
- removed 'additional context' sections. They seem to be rarely used and might cause confusion. If more details are needed, users can add them to the 'details' section
2020-01-28 10:41:10 -05:00
a3b3638f6f phrasing 2020-01-28 10:41:10 -05:00
c96ca70f25 Update ---new-benchmark.md 2020-01-28 10:41:10 -05:00
7b5eda32bb Update --new-model-addition.md
Motivate users to @-tag authors of models to increase visibility and expand the community
2020-01-28 10:41:10 -05:00
c63d91dd1c Update bug-report.md
- change references to pytorch-transformers to transformers
- link to code formatting guidelines
2020-01-28 10:41:10 -05:00
b2907cd06e Update feature-request.md
- add 'your contribution' section
- add code formatting link to 'additional context'
2020-01-28 10:41:10 -05:00
2fec88ee02 Update question-help.md
Prefer that general questions are asked on Stack Overflow
2020-01-28 10:41:10 -05:00
7e03d2bd7c update migration guide
Streamlines usages of pytorch-transformers and pytorch-pretrained-bert. Add link to the README for the migration guide.
2020-01-28 10:41:10 -05:00
335dd5e68a Default save steps 50 to 500 in all scripts 2020-01-28 09:42:11 -05:00
ea2600bd5f Absolute definitive HeisenDistilBug solve
cc @julien-c @thomwolf
2020-01-27 21:58:36 -05:00
5c3d441ee1 Fix formatting 2020-01-27 21:00:34 -05:00
f5a236c3ca Add Dutch pre-trained BERT model 2020-01-27 21:00:34 -05:00
6b4c3ee234 [run_lm_finetuning] GPT2 tokenizer doesn't have a pad_token
ping @lysandrejik
2020-01-27 20:14:02 -05:00
79815bf666 [serving] Fix typo 2020-01-27 19:58:25 -05:00
5004d5af42 [serving] Update dependencies 2020-01-27 19:58:00 -05:00
9ca21c838b Style 2020-01-27 14:49:12 -05:00
e0849a66ac adding in the doc 2020-01-27 14:27:07 -05:00
6b081f04e6 style and quality 2020-01-27 14:27:07 -05:00
0e31e06a75 Add AutoModelForPreTraining 2020-01-27 14:27:07 -05:00
ea56d305be make style 2020-01-27 12:13:32 -05:00
d440e21f5b add mapping of roberta for QA 2020-01-27 12:12:46 -05:00
875c4ae48f Definitive HeisenDistilBug fix
cc @julien-c @@thomwolf
2020-01-27 12:09:58 -05:00
f09f42d4d3 Input Embeddings should be assigned
cc @julien-c
2020-01-27 11:46:00 -05:00
bac51fba3a Fix token_type_ids for XLM-R 2020-01-27 11:08:31 -05:00
babd41e7fa Code quality 2020-01-24 17:06:55 -05:00
974d083c7b Accurate model for configuration 2020-01-24 16:46:03 -05:00
983fef469c AutoModels doc 2020-01-24 16:37:30 -05:00
009fcb0ec1 Configuration utils 2020-01-24 16:37:30 -05:00
11b13e94a3 Add type to help my IDE out 2020-01-24 14:00:57 -05:00
1ce3fb5cc7 update correct eval metrics (distilbert & co) 2020-01-24 11:45:22 -05:00
62f5804608 Update the doc string for T5WithLMHeadModel
T5WithLMHeadModel's doc string claims that indices of -1 are
ignored while computing the cross-entropy loss in the forward
pass; however, indices of -1 throw an error while indices of -100
are ignored. This commit updates the doc string to be consistent
with the class's behavior.
2020-01-24 10:28:20 -05:00
908230d261 Pickle CamemBERT tokenizer 2020-01-24 10:08:59 -05:00
24d5ad1dcc Run the examples in slow 2020-01-23 09:38:45 -05:00
9ddf60b694 Tips + whitespaces 2020-01-23 09:38:45 -05:00
0e9899f451 Fixes 2020-01-23 09:38:45 -05:00
48ac24020d TF CTRL 2020-01-23 09:38:45 -05:00
7511f3dd89 PyTorch CTRL + Style 2020-01-23 09:38:45 -05:00
980211a63a XLM-RoBERTa 2020-01-23 09:38:45 -05:00
6bc966793a TF DistilBERT 2020-01-23 09:38:45 -05:00
db1a7f27a1 PyTorch DistilBERT 2020-01-23 09:38:45 -05:00
b28020f590 TF RoBERTa 2020-01-23 09:38:45 -05:00
3e1bc27e1b Pytorch RoBERTa 2020-01-23 09:38:45 -05:00
f44ff574d3 Camembert 2020-01-23 09:38:45 -05:00
264eb23912 TF XLM 2020-01-23 09:38:45 -05:00
ccebcae75f PyTorch XLM 2020-01-23 09:38:45 -05:00
92b3cb786d TF XLNet 2020-01-23 09:38:45 -05:00
cd656fb21a PyTorch XLNet 2020-01-23 09:38:45 -05:00
83fa8d9fb5 TF Transformer-XL 2020-01-23 09:38:45 -05:00
98edad418e PyTorch Transformer-XL 2020-01-23 09:38:45 -05:00
96d21ad06b TF OpenAI GPT 2020-01-23 09:38:45 -05:00
850795c487 Pytorch GPT 2020-01-23 09:38:45 -05:00
1487b840d3 TF GPT2 2020-01-23 09:38:45 -05:00
bd0d3fd76e GPT-2 PyTorch models + better tips for BERT 2020-01-23 09:38:45 -05:00
dbeb7fb4e6 BERT TensorFlow 2020-01-23 09:38:45 -05:00
cd77c750c5 BERT PyTorch models 2020-01-23 09:38:45 -05:00
3922a2497e TF ALBERT + TF Utilities + Fix warnings 2020-01-23 09:38:45 -05:00
00df3d4de0 ALBERT Modeling + required changes to utilities 2020-01-23 09:38:45 -05:00
f81b6c95f2 Flake8 violation 2020-01-23 09:38:45 -05:00
632675ea88 Can test examples spread over multiple blocks 2020-01-23 09:38:45 -05:00
eaa6b9afc6 Require Torch when testing examples 2020-01-23 09:38:45 -05:00
9bab9b83d2 Glossary 2020-01-23 09:38:45 -05:00
64abd3e0aa Multi-line examples can be tested + ALBERT patch for CircleCI
All tests should now work fine.
2020-01-23 09:38:45 -05:00
837577256b Automatic testing of examples
The CircleCI test should fail.
2020-01-23 09:38:45 -05:00
90b7df444f Upload CLI: on win32, use slashes, not os.sep 2020-01-22 22:41:21 -05:00
119dc50e2a Doc tweak on model sharing 2020-01-22 22:40:38 -05:00
34a3c25a30 Fix for XLMRobertaConfig inherits from RobertaConfig
hat/tip @stefan-it
2020-01-22 17:50:24 -05:00
1a8e87be4e Line-by-line text dataset (including padding) 2020-01-21 16:57:38 -05:00
b94cf7faac change order 2020-01-21 16:57:38 -05:00
2eaa8b6e56 Easier to not support this, as it could be confusing
cc @lysandrejik
2020-01-21 16:57:38 -05:00
801aaa5508 make style 2020-01-21 16:57:38 -05:00
56d4ba8ddb [run_lm_finetuning] Train from scratch 2020-01-21 16:57:38 -05:00
c7f79815e7 Cleanup unused variables 2020-01-21 11:40:24 -05:00
15579e2d55 [SQuAD v2] Code quality 2020-01-21 11:36:46 -05:00
088fa7b759 Correct segment ID for XLNet single sequence 2020-01-21 11:33:45 -05:00
073219b43f Manage impossible examples SQuAD v2 2020-01-21 11:24:43 -05:00
983c484fa2 add __getstate__ and __setstate__ to XLMRobertaTokenizer 2020-01-21 10:18:24 -05:00
cefd51c50c Fix glue processor failing on tf datasets 2020-01-20 11:46:43 -05:00
ca6ce3040d Fix style 2020-01-20 10:56:23 -05:00
908cd5ea27 Make forward asynchrone to avoid long computation timing out.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
6e6c8c52ed Fix bad handling of env variable USE_TF / USE_TORCH leading to invalid framework being used.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
23c6998bf4 Add lower bound to tqdm for tqdm.auto
- It appears that `tqdm` only introduced `tqdm.auto` in 4.27.
- See https://github.com/tqdm/tqdm/releases/tag/v4.27.0.
- Without the lower bound I received the following stack trace in an environment where I already had tqdm installed:
```
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module>
    from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module>
    from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
2020-01-17 18:29:11 -05:00
65a89a8976 Fix BasicTokenizer to respect never_split parameters (#2557)
* add failing test

* fix call to _run_split_on_punc

* format with black
2020-01-17 14:57:56 -05:00
6d5049a24d Fix typo in examples/run_squad.py
Rul -> Run
2020-01-17 11:22:51 -05:00
23a2cea8cb Tokenizer.from_pretrained: fetch all possible files remotely 2020-01-16 16:47:19 -05:00
99f9243de5 same here, try to not serialize too much if unneeded 2020-01-16 16:47:19 -05:00
9d8fd2d40e tokenizer.save_pretrained: only save file if non-empty 2020-01-16 16:47:19 -05:00
6e2c28a14a Run SQuAD warning when the doc stride may be too high 2020-01-16 13:59:26 -05:00
b8f43cb273 Merge pull request #2239 from ns-moosavi/HANS-evaluation-example
HANS evaluation
2020-01-16 13:28:25 +01:00
258ed2eaa8 adding details in readme 2020-01-16 13:21:30 +01:00
50ee59578d update formating - make flake8 happy 2020-01-16 13:21:30 +01:00
1c9333584a formating 2020-01-16 13:21:30 +01:00
e25b6fe354 updating readme 2020-01-16 13:21:30 +01:00
27c7b99015 adding details in readme - moving file 2020-01-16 13:21:30 +01:00
99d4515572 HANS evaluation 2020-01-16 13:21:30 +01:00
dc17f2a111 Merge pull request #2538 from huggingface/py3_super
💄 super
2020-01-16 13:17:15 +01:00
880854846b Merge pull request #2540 from huggingface/torch14_fix
[PyTorch 1.4] Fix failing torchscript test for xlnet
2020-01-16 13:16:59 +01:00
d9fa1bad72 Fix failing torchscript test for xlnet
model.parameters() order is apparently not stable (only for xlnet, for some reason)
2020-01-15 20:22:21 -05:00
a98b2ca8c0 Style + fixup BertJapaneseTokenizer 2020-01-15 19:05:51 -05:00
83a41d39b3 💄 super 2020-01-15 18:33:50 -05:00
cd51893d37 Merge branch 'Rexhaif-patch-1' 2020-01-15 18:25:15 -05:00
248aeaa842 Merge branch 'patch-1' of https://github.com/Rexhaif/transformers into Rexhaif-patch-1 2020-01-15 18:22:01 -05:00
c76c3cebed Add check for token_type_ids before tensorizing
Fix an issue where `prepare_for_model()` gives a `KeyError` when
`return_token_type_ids` is set to `False` and `return_tensors` is
enabled.
2020-01-15 12:31:43 -05:00
eb59e9f705 Graduate sst-2 to a canonical one 2020-01-15 16:28:50 +00:00
e184ad13cf Close #2392 2020-01-15 15:43:44 +00:00
dfe012ad9d Fix misleading RoBERTa token type ids 2020-01-14 17:47:28 -05:00
c024ab98df Improve padding side documentation 2020-01-14 17:44:23 -05:00
9aeb0b9b8a Improve padding side documentation 2020-01-14 17:43:00 -05:00
715fa638a7 Merge branch 'master' into from_scratch_training 2020-01-14 18:58:21 +00:00
100e3b6f21 Bias should be resized with the weights
Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder.

Added a test.
2020-01-14 13:43:45 -05:00
6c32d8bb95 Size > Dimensionality + Remove final TODOs 2020-01-14 14:09:09 +01:00
760164d63b RoBERTa example 2020-01-14 14:09:09 +01:00
387217bd3e Added example usage 2020-01-14 14:09:09 +01:00
7d1bb7f256 Add missing XLNet and XLM models 2020-01-14 14:09:09 +01:00
a1cb100460 Wrap up configurations 2020-01-14 14:09:09 +01:00
c11b6fd393 Update links in all configurations 2020-01-14 14:09:09 +01:00
632682726f Updated Configurations 2020-01-14 14:09:09 +01:00
2b566c182e Merge pull request #2384 from dimagalat/master
Releasing file lock
2020-01-14 13:19:01 +01:00
764f836d52 Update test_tokenization_auto.py 2020-01-13 22:50:34 -05:00
d5831acb07 Update test_tokenization_auto.py 2020-01-13 22:47:33 -05:00
ed6cd597cc Update test_tokenization_auto.py 2020-01-13 22:46:35 -05:00
5cb463a714 Update test_tokenization_auto.py 2020-01-13 22:38:29 -05:00
afc24ea5d4 In a parallel setup this could fail 2020-01-13 23:44:08 +00:00
894812c652 Fixup mapping 2020-01-13 23:34:19 +00:00
b20f11d4ca 🔫 Python35 2020-01-13 23:20:44 +00:00
0304628590 Map configs to models and tokenizers 2020-01-13 23:11:44 +00:00
1fc855e456 [tests] Safety checks on CONFIG_MAPPING 2020-01-13 21:52:55 +00:00
3c86b6f3c5 Py35 doesn't like inline variable types 2020-01-13 20:44:33 +00:00
b803b067bf Config to Model mapping 2020-01-13 20:05:20 +00:00
896a0eb1fd Merge pull request #2459 from Perseus14/patch-4
Update pipelines.py
2020-01-13 16:02:54 +01:00
0d6c17fc1b black formatting 2020-01-13 11:18:27 +01:00
a3085020ed Added repetition penalty to PPLM example (#2436)
* Added repetition penalty

* Default PPLM repetition_penalty to neutral

* Minor modifications to comply with reviewer's suggestions. (j -> token_idx)

* Formatted code with `make style`
2020-01-10 23:00:07 -05:00
cf8a70bf68 More AutoConfig tests 2020-01-11 03:43:57 +00:00
6bb3edc300 Serialize model_type if exists 2020-01-11 03:18:56 +00:00
c6f682c1eb flake 2020-01-11 03:18:31 +00:00
4d1c98c012 AutoConfig + other Auto classes honor model_type 2020-01-11 02:46:17 +00:00
2f32dfd33b Convention: name mixins mixins 2020-01-11 01:24:29 +00:00
e83d9f1c1d cleaning - change ' to " (black requirements) 2020-01-10 19:34:25 -05:00
ebba9e929d minor spring cleaning - missing configs + processing 2020-01-10 19:14:58 -05:00
055e80cfad rm old ConfigTester 2020-01-10 21:36:18 +00:00
b1e1a9f9b2 Merge pull request #2495 from mschrimpf/patch-1
T5: move rp_bucket to relative_attention_bias' device
2020-01-10 22:18:54 +01:00
fd8423321f keep list sorted 2020-01-10 20:36:46 +00:00
0cd81fb99f [isort] declare more third-parties in case no tf install 2020-01-10 20:35:45 +00:00
90d3b787f6 move rp_bucket to relative_attention_bias' device
otherwise, `rp_bucket` will always be on cpu and fail if `self.relative_attention_bias` is on cuda
2020-01-10 15:09:10 -05:00
84c0aa1868 num_parameters helper 2020-01-10 17:40:02 +00:00
331065e62d missing import 2020-01-10 11:42:53 +01:00
414e9e7122 indents test 2020-01-10 11:42:53 +01:00
3cdb38a7c0 indents 2020-01-10 11:42:53 +01:00
ebd45980a0 Align with run_squad + fix some errors 2020-01-10 11:42:53 +01:00
45634f87f8 fix Sampler in distributed training - evaluation 2020-01-10 11:42:53 +01:00
af1ee9e648 Move torch.nn.utils.clip_grad_norm_ 2020-01-10 11:42:53 +01:00
164c794eb3 New SQuAD API for distillation script 2020-01-10 11:42:53 +01:00
801f2ac8c7 Add PRETRAINED_INIT_CONFIGURATION to DistilBERT tokenizer 2020-01-10 11:42:21 +01:00
bfec203d4e modified: src/transformers/tokenization_utils.py 2020-01-09 12:54:28 +01:00
f599623a99 PreTrainedTokenizerFast: hotfix _convert_encoding
cc @n1t0
2020-01-08 15:46:37 -05:00
f26a353057 Update pipelines.py
Modified QA pipeline to consider all features for each example before generating topk answers. 
Current pipeline only takes one SquadExample, one SquadFeature, one start logit list, one end logit list to retrieve the answer, this is not correct as one SquadExample can produce multiple SquadFeatures.
2020-01-08 21:12:34 +05:30
16ce15ed4b DistilBERT token type ids removed from inputs in run_squad 2020-01-08 13:18:30 +01:00
f24232cd1b Fix error with global step in run_squad.py 2020-01-08 11:39:00 +01:00
1b59b57b57 ignore_index equal -100 in T5 model 2020-01-08 09:52:10 +01:00
569da80ced Make doc regarding masked indices more clear.
Signed-off-by: Romain Keramitas <r.keramitas@gmail.com>
2020-01-07 17:37:27 +01:00
43114b89ba spelling correction (#2434) 2020-01-07 17:25:25 +01:00
d6a677b14b Fix typograpical errors (#2438) 2020-01-07 17:21:23 +01:00
27c1b656cc Fix error with global step in run_lm_finetuning.py 2020-01-07 16:16:12 +01:00
24df44d9c7 Black version python 3.5 2020-01-07 15:53:42 +01:00
73be60c47b Quotes 2020-01-07 15:34:23 +01:00
6806f8204e fix #2410 2020-01-07 15:20:45 +01:00
176d3b3079 Add support for Albert and XLMRoberta for the Glue example (#2403)
* Add support for Albert and XLMRoberta for the Glue example
2020-01-07 14:55:55 +01:00
9261c7f771 Remove f-string device creation on PyTorch GPU pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-07 11:46:44 +01:00
91d33c798b Fix issue on pipelines where pytorch's tensors are not copied on the user-specified GPU device.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-07 11:12:31 +01:00
2926852f14 fixed formatting 2020-01-07 11:56:03 +11:00
e2810edc8f removing redundant .flush 2020-01-07 11:47:25 +11:00
c301faa92b Distributed or parallel setup 2020-01-06 18:41:08 -05:00
81d6841b4b GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
dd4df80f0b Moved the encoded_prompts to correct device 2020-01-06 15:11:12 +01:00
1efc208ff3 Complete DataProcessor class 2020-01-06 15:02:25 +01:00
c45d0cf60f Improve logging message in the single sentence classification processor 2020-01-06 14:54:36 +01:00
bf89be77b9 Improve logging message in the single sentence classification processor 2020-01-06 14:54:36 +01:00
bf8d4bc674 Improve logging message in glue feature conversion 2020-01-06 14:54:36 +01:00
74755c89b9 Example snippet for BertForQuestionAnswering 2020-01-06 14:41:53 +01:00
0ffc8eaf53 Enforce target version for black.
This should stabilize formatting.
2020-01-05 12:52:14 -05:00
f01b3e6680 fix #2399 an ImportError in official example (#2400)
* fix #2399 an ImportError in official example

* style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-01-05 12:50:20 -05:00
78528742f1 Fix syntax + link to community page 2020-01-05 12:43:39 -05:00
12e0aa4368 Proposition to include community models in readme 2020-01-05 12:37:11 -05:00
80faf22b4a Updating documentation for converting tensorflow model to reflect the new cli convert format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-04 13:41:18 +01:00
d0e594f9db Releasing file lock 2020-01-02 09:45:48 +11:00
629b22adcf [run_lm_finetuning] mask_tokens: document types 2020-01-01 12:55:10 -05:00
594ca6dead [debug] Debug Heisenbug, the old school way. 2019-12-29 10:07:21 -05:00
0df4e62da0 [http] Tweak http user-agent (#2353) 2019-12-29 10:06:50 -05:00
f75bf05ce6 Merge pull request #2352 from huggingface/cli_tweaks
Cli tweaks
2019-12-28 15:40:00 +01:00
0d467fd6de Typo 2019-12-27 23:06:48 -05:00
d8293e84f3 [cli] upload: max number of files at the same time 2019-12-27 23:02:53 -05:00
4d6c93e923 Kill __main__ 2019-12-27 22:55:22 -05:00
9b2badf3c9 [cli] Update doc 2019-12-27 22:54:29 -05:00
f78ebc22ad [cli] Add ability to delete remote object 2019-12-27 22:53:49 -05:00
bfe870be65 Hotfix tokenizers version for sdist installs 2019-12-27 11:05:52 -05:00
74ea432847 Merge pull request #2286 from adelevie/patch-2
Typo in tokenization_utils.py
2019-12-27 10:50:47 +01:00
492bea9aa0 Merge pull request #2292 from patrickvonplaten/add_cached_past_for_language_generation
Add cached past for language generation
2019-12-27 10:33:27 +01:00
e213900fa2 Merge pull request #2290 from patrickvonplaten/fix_typo_in_doc_for_language_generation
duplicated line for repeating_words_penalty_for_language_generation
2019-12-27 10:29:06 +01:00
9f5f646442 Merge pull request #2211 from huggingface/fast-tokenizers
Fast tokenizers
2019-12-27 10:24:29 +01:00
9024b19994 Auto-format (fixes previous commit). 2019-12-27 10:13:52 +01:00
3233b58ad4 Quote square brackets in shell commands.
This ensures compatibility with zsh.

Fix #2316.
2019-12-27 08:50:25 +01:00
e6ec24fa88 Better added_tokens handling 2019-12-26 16:49:48 -05:00
599db139f9 Code style update 2019-12-26 15:13:30 -05:00
835b76a46f Handle unk_token
As we discussed, this is handled here directly 
cc @thomwolf
2019-12-26 14:42:55 -05:00
7ead04ce14 FastPreTrainedTokenizer => PreTrainedTokenizerFast 2019-12-26 14:39:39 -05:00
1f82a5d910 Update for changes in tokenizers API 2019-12-26 14:37:55 -05:00
8c67b529f6 Merge pull request #2324 from kashif/patch-1
Typo in serving.py
2019-12-26 12:38:06 +01:00
7211541ade Typo in serving.py 2019-12-26 12:21:40 +01:00
0f6017bee3 improve comments for examples 2019-12-26 00:35:11 +01:00
87c8fca9bc add example for ctrl text generation in docs 2019-12-26 00:29:19 +01:00
88def24c45 merge conflicts - renamed to previous_token singular 2019-12-26 00:27:16 +01:00
822f725a07 duplicated line for repeating_words_penalty_for_language_generation 2019-12-26 00:25:29 +01:00
fc84bd5254 adapt style to predefined style layout 2019-12-25 23:32:44 +01:00
deff792bb6 add prepare inputs for transfo_xl and xlnet 2019-12-25 23:17:24 +01:00
9398058e19 add easy tensor shape match test 2019-12-25 23:17:24 +01:00
90cda45e9e add past re-ordering for beam search 2019-12-25 23:17:24 +01:00
6bca56fdb0 check for self.config.mem_len instead of self.mem_len in _do_output_past 2019-12-25 23:17:24 +01:00
365ccd0af2 make if statements cleaner for prepare_inputs_for_generation 2019-12-25 23:17:24 +01:00
d039c679d2 better naming for if statement 2019-12-25 23:17:24 +01:00
7e0c5c731a changed do_output_past function to check for self.config.output_past instead of self.output_past 2019-12-25 23:17:24 +01:00
eeaa402cd4 rename comments 2019-12-25 23:17:24 +01:00
7bb4271291 remove ipdb debugging statements 2019-12-25 23:17:24 +01:00
267587c258 add and improve comments 2019-12-25 23:17:24 +01:00
d891fd0ae0 add past hidden key states for more efficient language generation & add prepare_inputs for gpt2 and ctrl model 2019-12-25 23:17:24 +01:00
aeef4823ab Merge pull request #2303 from patrickvonplaten/fix_error_with_repetition_penalty
fix repetition penalty error in modeling_utils.py
2019-12-25 22:39:20 +01:00
0412f3d929 Merge pull request #2291 from aaugustin/fix-flake8-F841
Fix F841 flake8 warning
2019-12-25 22:37:42 +01:00
8742c95461 Merge pull request #2289 from patrickvonplaten/fix_effective_batch_size_lang_gen_xlm
fix bug in prepare inputs for language generation for xlm for effective batch_size > 1
2019-12-25 22:30:46 +01:00
1240be3ed9 Merge pull request #2312 from vitaliyradchenko/fix_special_and_add_tokens_loading
Correct tokenization for special and added tokens
2019-12-25 20:52:30 +01:00
b262577d17 add special tokens to unique_added_tokens_encoder 2019-12-25 18:31:35 +02:00
83a2347952 fixed lack of added and special tokens 2019-12-25 18:03:19 +02:00
cea04a2443 Merge pull request #2310 from ShnitzelKiller/scatter-unfix
revert erroneous fix #2276
2019-12-25 12:43:22 +01:00
e1844d9a45 use positional arguments due to inconsistent API 2019-12-25 01:34:02 -08:00
9fb7addd4d revert erroneous fix 2019-12-24 22:26:09 -08:00
734d29b03d tokenizers is now a real dependency 2019-12-24 13:32:41 -05:00
2818e50569 Add tests for fast tokenizers 2019-12-24 13:29:01 -05:00
31c56f2e0b Fix style 2019-12-24 12:43:27 -05:00
951ae99bea BertTokenizerFast 2019-12-24 12:24:24 -05:00
041eac2d6d GPT2TokenizerFast 2019-12-24 12:24:14 -05:00
3471ff0d35 FastPreTrainedTokenizer 2019-12-24 12:23:30 -05:00
18e5bdbec5 fix repetition penalty error in modeling_utils.py 2019-12-24 17:18:05 +01:00
f18ac4c28e fix sequence length for prepare_inputs for xlnet 2019-12-24 16:43:24 +01:00
359dc43837 fix effective batch_size error in prepare_inputs also for xlnet 2019-12-24 16:33:20 +01:00
d98a384cb0 fix bug in prepare inputs for language generation for xlm for effective batch_size > 1 2019-12-24 16:29:54 +01:00
3e0cf49514 adding back last dropout in TF 2.0 T5 2019-12-24 11:30:56 +01:00
35d32308de adding back final dropout in T5 2019-12-24 11:29:49 +01:00
81db12c3ba Merge pull request #2271 from aaugustin/improve-setup-and-requirements
Improve setup and requirements
2019-12-24 11:21:20 +01:00
10724a8123 Run the slow tests every Monday morning. 2019-12-24 09:09:43 +01:00
a8d34e534e Remove [--editable] in install instructions.
Use -e only in docs targeted at contributors.

If a user copy-pastes  command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.
2019-12-24 08:46:08 +01:00
e74c73a85d Enable F841 warning in flake8. 2019-12-23 22:38:23 +01:00
e6c0019c80 Remove unused variables in tests. 2019-12-23 22:38:18 +01:00
495580dad1 Remove unused variables in templates. 2019-12-23 22:38:18 +01:00
71f94a8a1c Remove unused variables in src. 2019-12-23 22:38:09 +01:00
81422c4e6d Remove unused variables in examples. 2019-12-23 22:29:02 +01:00
072750f4dc Merge pull request #2288 from aaugustin/better-handle-optional-imports
Improve handling of optional imports
2019-12-23 22:28:47 +01:00
4621ad6f9d Use the same pattern as everywhere else.
This is really just for consistency.
2019-12-23 21:30:04 +01:00
a31d4a2971 Reraise ImportError when sentencepiece isn't installed.
Else, the next line fails with a confusion exception because the spm
variable isn't defined.
2019-12-23 21:27:42 +01:00
c8b0c1e551 Improve exception type.
ImportError isn't really appropriate when there's no import involved.
2019-12-23 21:27:38 +01:00
4c09a96096 Simplify re-raising exceptions.
Most module use the simpler `raise` version. Normalize those that don't.
2019-12-23 21:20:54 +01:00
5565dcdd35 Remove warning when scikit-learn isn't available.
Most users don't need it.
2019-12-23 21:16:26 +01:00
8a6881822a Run some tests on Python 3.7.
This will improve version coverage.
2019-12-23 21:06:23 +01:00
7a865821d9 Remove stray egg-info directory automatically.
If a user or contributor ran `pip install -e .` on transformers < 3.0,
pip created a transformers.egg-info directory next to the transformers
directory at the root of the repository.

In transformers 3.0, the source is in a `src` subdirectory.
`pip install -e .` creates a transformers.egg-info directory there.
However, pip will still pick transformers.egg-info from the previous
location. This is a bug: https://github.com/pypa/pip/issues/5466

Users and contributors are likely to hit this problem because the
documentation for transformers 3.0 relies heavily on extra_requires
which didn't exist in earlier versions, so aren't defined in a stale
transformers.egg-info directory.

If such a directory exists, remove it. It's autogenerated, gitignored
and not supposed to contain anything of value.
2019-12-23 21:06:23 +01:00
70373a5f7c Update contribution instructions.
Also provide shortcuts in a Makefile.
2019-12-23 21:05:30 +01:00
c3783399db Remove redundant requirements with transformers. 2019-12-23 19:17:27 +01:00
d79e9c9a9a Remove docs/requirements.txt.
It's superseded by the "docs" extras.
2019-12-23 19:17:07 +01:00
d73eb552e8 Remove requirements.txt.
It's redundant with setup.py and, also, incomplete (e.g. numpy).
2019-12-23 19:15:08 +01:00
9fcc532df6 Remove requirements-dev.txt.
It was generated once, likely in a non-reproducible way (pip freeze
in a contributor's local environment), and never updated.
2019-12-23 19:14:36 +01:00
76a1417f2a Include all optional dependencies in extras.
Take advantage of this to simplify the Circle CI configuration.

Don't bother with tensorboardX: it's a fallback for PyTorch < 1.1.0.
2019-12-23 19:14:31 +01:00
9fc8dcb2a0 Standardize import.
Every other file uses this pattern.
2019-12-23 18:45:42 +01:00
f2522869ea Review and update setup.py. 2019-12-23 18:45:42 +01:00
7cef764ec0 Typo in tokenization_utils.py
avoir -> avoid
2019-12-23 12:14:50 -05:00
23dad8447c Install deps from setup.py for building docs.
requirements.txt isn't up to date.
2019-12-23 17:06:32 +01:00
d8e33dbd67 Fix path to source code in docs config.
This should fix API docs, which went AWOL with yesterday's changes.
2019-12-23 16:49:35 +01:00
59b123bc50 fix tqdm logging level 2019-12-23 16:47:24 +01:00
ba2378ced5 Merge pull request #2264 from upura/fix-doclink
Fix doc link in README
2019-12-23 12:31:00 +01:00
e4e2a666c9 Merge pull request #2276 from ShnitzelKiller/scatterfix
fix error due to wrong argument name to Tensor.scatter()
2019-12-23 12:19:48 +01:00
398bb03f98 fix out-of-place call to scatter, whose named argument name is source, not src 2019-12-22 23:30:52 -08:00
ce50305e5b Merge pull request #2270 from aaugustin/remove-python-2
Remove support for Python 2
2019-12-22 23:04:37 +01:00
1a948d7020 Switch from comments to annotations for types. 2019-12-22 18:56:01 +01:00
1c62e87b34 Use built-in open().
On Python 3, `open is io.open`.
2019-12-22 18:38:56 +01:00
d6eaf4e6d2 Update comments mentioning Python 2. 2019-12-22 18:38:56 +01:00
45841eaf7b Remove references to Python 2 in documentation. 2019-12-22 18:38:56 +01:00
0dddc1494d Remove py3 marker. 2019-12-22 18:38:56 +01:00
75a23d24af Remove import fallbacks. 2019-12-22 18:38:56 +01:00
798b3b3899 Remove sys.version_info[0] == 2 or 3. 2019-12-22 18:38:42 +01:00
8af25b1664 Remove six. 2019-12-22 17:56:09 +01:00
6b2200fc88 Remove u-prefixes. 2019-12-22 17:47:54 +01:00
c824d15aa1 Remove __future__ imports. 2019-12-22 17:47:54 +01:00
b6ea0f43ae Remove duplicate -v flag. 2019-12-22 17:47:27 +01:00
5daca95ddd Merge pull request #2268 from aaugustin/improve-repository-structure
Improve repository structure
2019-12-22 16:41:53 +01:00
54abc67aec Merge pull request #2255 from aaugustin/implement-best-practices
Implement some Python best practices
2019-12-22 16:31:11 +01:00
00204f2b4c Replace CommonTestCases for tokenizers with a mixin.
This is the same change as for (TF)CommonTestCases for modeling.
2019-12-22 15:35:25 +01:00
a3c5883f2c Rename file for consistency. 2019-12-22 15:35:25 +01:00
daf8bebcdd Remove unused GPTModelTester.
It isn't imported anywhere.
2019-12-22 15:35:25 +01:00
345c23a60f Replace (TF)CommonTestCases for modeling with a mixin.
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
2019-12-22 15:35:18 +01:00
7e98e211f0 Remove unittest.main() in test modules.
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
2019-12-22 14:42:03 +01:00
6be7cdda66 Move source code inside a src subdirectory.
This prevents transformers from being importable simply because the CWD
is the root of the git repository, while not being importable from other
directories. That led to inconsistent behavior, especially in examples.

Once you fetch this commit, in your dev environment, you must run:

    $ pip uninstall transformers
    $ pip install -e .
2019-12-22 14:15:13 +01:00
ced0a94204 Switch test files to the standard test_*.py scheme. 2019-12-22 14:15:13 +01:00
067395d5c5 Move tests outside of library. 2019-12-22 13:47:17 +01:00
698f9e3d7a Remove trailing whitespace in README. 2019-12-22 13:29:58 +01:00
c11b3e2926 Sort imports for optional third-party libraries.
These libraries aren't always installed in the virtual environment where
isort is running. Declaring them properly avoids mixing these
third-party imports with local imports.
2019-12-22 11:19:13 +01:00
2a34d5b71b Stabilize import order for packaging.
I don't want to consider it a dependency of transformers, but it's
usually there in local development and usually not there in CI.
2019-12-22 11:07:31 +01:00
c9270086ea Disable flake8 F841 in CI to get a passing run.
I'll fix it later.
2019-12-22 11:00:06 +01:00
577a03664d Enforce flake8 in CI. 2019-12-22 11:00:04 +01:00
7c6812645a Restore proper import for HTTPError. 2019-12-22 10:59:08 +01:00
939148b050 Fix F401 flake8 warning (x28).
Do manually what autoflake couldn't manage.
2019-12-22 10:59:08 +01:00
783a616999 Fix F401 flake8 warning (x88 / 116).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
80327a13ea Fix F401 flake8 warning (x152 / 268).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
654e051e2a Ignore F401 flake8 warning (x326 / 594). 2019-12-22 10:59:08 +01:00
fa2ccbc081 Fix E266 flake8 warning (x90). 2019-12-22 10:59:08 +01:00
2ab78325f0 Fix F821 flake8 warning (x47).
Ignore warnings related to Python 2, because it's going away soon.
2019-12-22 10:59:07 +01:00
631be27078 Fix E722 flake8 warnings (x26). 2019-12-22 10:59:07 +01:00
b0f7db73cd Fix E741 flake8 warning (x14). 2019-12-22 10:59:07 +01:00
ea89bec185 Fix E231 flake8 warning (x9). 2019-12-22 10:59:07 +01:00
fd2f17a7a1 Fix E714 flake8 warning (x8). 2019-12-22 10:59:07 +01:00
5eab3cf6bc Fix W605 flake8 warning (x5). 2019-12-22 10:59:07 +01:00
7dce8dc7ac Fix E731 flake8 warning (x3). 2019-12-22 10:59:07 +01:00
eed46f38b7 Fix E302 flake8 warning (x3). 2019-12-22 10:59:07 +01:00
b1de7ae08a Fix F811 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
357db7098c Fix E712 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
f9c5317db2 Fix E265 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
28e608a2c2 Remove trailing whitespace from all Python files.
Fixes flake8 warning W291 (x224).
2019-12-22 10:59:07 +01:00
1efa0a7552 Add black-compatible flake8 configuration. 2019-12-22 10:59:07 +01:00
d0c9fe277a Fix circular import in transformers.pipelines.
Submodules shouldn't import from their parent in general.
2019-12-22 10:59:07 +01:00
5ca054757f Update "make style" to sort imports with isort. 2019-12-22 10:59:07 +01:00
9e80fc7b2f Enforce isort in CI.
We need https://github.com/timothycrosley/isort/pull/1000 but there's no
release with this fix yet, so we'll install from GitHub.
2019-12-22 10:59:00 +01:00
158e82e061 Sort imports with isort.
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
2019-12-22 10:57:46 +01:00
9d00f78f16 fix doc link 2019-12-22 16:07:05 +09:00
b668a740ca Fixing incorrect link in model docstring
The docstring contains a link to Salesforce/CTRL repo, while the model itself is Facebookresearch/mmbt. It may be the wrong copy\paste.
2019-12-22 00:01:14 +03:00
bc1715c1e0 Add black-compatible isort configuration.
lines_after_imports = 2 is a matter of taste; I like it.
2019-12-21 17:53:18 +01:00
36883c1192 Add "make style" to format code with black. 2019-12-21 17:53:18 +01:00
6e5291a915 Enforce black in CI. 2019-12-21 17:53:18 +01:00
fa84ae26d6 Reformat source code with black.
This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
2019-12-21 17:52:29 +01:00
63e3827c6b Remove empty file.
Likely it was added by accident.
2019-12-21 15:38:08 +01:00
645713e2cb Merge pull request #2254 from huggingface/fix-tfroberta
adding positional embeds masking to TFRoBERTa
2019-12-21 15:33:22 +01:00
73f6e9817c Merge pull request #2115 from suvrat96/add_mmbt_model
[WIP] Add MMBT Model to Transformers Repo
2019-12-21 15:26:08 +01:00
77676c27d2 adding positional embeds masking to TFRoBERTa 2019-12-21 15:24:48 +01:00
344126fe58 move example to mm-imdb folder 2019-12-21 15:06:52 +01:00
5b7fb6a4a1 Merge pull request #2134 from bkkaggle/saving-and-resuming
closes #1960 Add saving and resuming functionality for remaining examples
2019-12-21 15:03:53 +01:00
6f68d559ab Merge pull request #2130 from huggingface/ignored-index-coherence
[BREAKING CHANGE] Setting all ignored index to the PyTorch standard
2019-12-21 14:55:40 +01:00
1ab25c49d3 Merge branch 'master' into pr/2115 2019-12-21 14:54:30 +01:00
b03872aae0 fix merge 2019-12-21 14:49:54 +01:00
518ba748e0 Merge branch 'master' into saving-and-resuming 2019-12-21 14:41:39 +01:00
18601c3b6e Merge pull request #2173 from erenup/master
run_squad with roberta
2019-12-21 14:33:16 +01:00
6e7102cfb3 Merge pull request #2203 from gthb/patch-1
fix: wrong architecture count in README
2019-12-21 14:31:44 +01:00
deceb00161 Merge pull request #2177 from mandubian/issue-2106
:zip: #2106 tokenizer.tokenize speed improvement (3-8x) by caching added_tokens in a Set
2019-12-21 14:31:20 +01:00
eeb70cdd77 Merge branch 'master' into saving-and-resuming 2019-12-21 14:29:59 +01:00
ed9b84816e Merge pull request #1840 from huggingface/generation_sampler
[WIP] Sampling sequence generator for transformers
2019-12-21 14:27:35 +01:00
f86ed23189 update doc 2019-12-21 14:13:06 +01:00
cfa0380515 Merge branch 'master' into generation_sampler 2019-12-21 14:12:52 +01:00
300ec3003c fixing run_generation example - using torch.no_grad 2019-12-21 14:02:19 +01:00
1c37746892 fixing run_generation 2019-12-21 13:52:49 +01:00
7e17f09fb5 Merge pull request #1803 from importpandas/fix-xlnet-squad2.0
fix run_squad.py during fine-tuning xlnet on squad2.0
2019-12-21 13:38:48 +01:00
8a2be93b4e fix merge 2019-12-21 13:31:28 +01:00
562f864038 Merge branch 'master' into fix-xlnet-squad2.0 2019-12-21 12:48:10 +01:00
8618bf15d6 Merge pull request #1736 from huggingface/fix-tf-xlnet
Fix TFXLNet
2019-12-21 12:42:05 +01:00
2fa8737c44 Merge pull request #1586 from enzoampil/include_special_tokens_in_bert_examples
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-12-21 12:36:11 +01:00
f15f087143 Merge pull request #1764 from DomHudson/bug-fix-1761
Bug-fix: Roberta Embeddings Not Masked
2019-12-21 12:13:27 +01:00
fae4d1c266 Merge pull request #2217 from aaugustin/test-parallelization
Support running tests in parallel
2019-12-21 11:54:23 +01:00
b8e924e10d Restore test.
This looks like debug code accidentally committed in b18509c2.

Refs #2250.
2019-12-21 08:50:15 +01:00
767bc3ca68 Fix typo in model name.
This looks like a copy/paste mistake. Probably this test was never run.

Refs #2250.
2019-12-21 08:46:26 +01:00
343c094f21 Run examples separately from tests.
This optimizes the total run time of the Circle CI test suite.
2019-12-21 08:43:19 +01:00
80caf79d07 Prevent excessive parallelism in PyTorch.
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.

PyTorch performance craters without this.
2019-12-21 08:43:19 +01:00
bb3bfa2d29 Distribute tests from the same file to the same worker.
This should prevent two issues:

- hitting API rate limits for tests that hit the HF API
- multiplying the cost of expensive test setups
2019-12-21 08:43:19 +01:00
29cbab98f0 Parallelize tests on Circle CI.
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).

Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
2019-12-21 08:43:19 +01:00
a4c9338b83 Prevent parallel downloads of the same file with a lock.
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
2019-12-21 08:43:19 +01:00
b670c26684 Take advantage of the cache when running tests.
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.

Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.

Fix #2222.
2019-12-21 08:43:19 +01:00
b67fa1a8d2 Download models directly to cache_dir.
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.

Refs #2222.
2019-12-21 08:43:19 +01:00
286d5bb6b7 Use a random temp dir for writing pruned models in tests. 2019-12-21 08:43:19 +01:00
478e456e83 Use a random temp dir for writing file in tests. 2019-12-21 08:43:19 +01:00
12726f8556 Remove redundant torch.jit.trace in tests.
This looks like it could be expensive, so don't run it twice.
2019-12-21 08:43:19 +01:00
ac1b449cc9 [doc] move distilroberta to more appropriate place
cc @lysandrejik
2019-12-21 00:09:01 -05:00
3e52915fa7 [RoBERTa] Embeddings: fix dimensionality bug 2019-12-20 19:01:27 -05:00
228f52867c Bug fix: 1764 2019-12-20 18:27:35 -05:00
a80778f40e small refactoring (only esthetic, not functional) 2019-12-20 17:21:24 -05:00
3df1d2d144 - Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists.
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
2019-12-20 17:21:24 -05:00
a436574bfd Release: v2.3.0 2019-12-20 16:22:20 -05:00
d0f8b9a978 Merge pull request #2244 from huggingface/fix-tok-pipe
Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement
2019-12-20 22:10:39 +01:00
a557836a70 Merge pull request #2191 from huggingface/fix_sp_np
Numpy compatibility for sentence piece
2019-12-20 22:08:08 +01:00
655fd06853 clean up 2019-12-20 21:57:49 +01:00
e5812462fc clean up debug and less verbose tqdm 2019-12-20 21:51:48 +01:00
4775ec354b add overwrite - fix ner decoding 2019-12-20 21:47:15 +01:00
cb6d54bfda Numpy compatibility for sentence piece
convert to int earlier
2019-12-20 15:06:28 -05:00
f79a7dc661 fix NER pipeline 2019-12-20 20:57:45 +01:00
a241011057 fix pipeline NER 2019-12-20 20:43:48 +01:00
e37ca8e11a fix camembert and XLM-R tokenizer 2019-12-20 20:43:42 +01:00
ceae85ad60 fix mc loading 2019-12-20 19:52:24 +01:00
71883b6ddc update link in readme 2019-12-20 19:40:23 +01:00
8d5a47c79b Merge pull request #2243 from huggingface/fix-xlm-roberta
fixing xlm-roberta tokenizer max_length and automodels
2019-12-20 19:34:08 +01:00
79e4a6a25c update serving API 2019-12-20 19:33:12 +01:00
bbaaec046c fixing CLI pipeline 2019-12-20 19:19:20 +01:00
1c12ee0e55 fixing xlm-roberta tokenizer max_length and automodels 2019-12-20 18:28:27 +01:00
65c75fc587 Clean special tokens test 2019-12-20 11:34:16 -05:00
fb393ad994 Added test for all special tokens 2019-12-20 11:29:58 -05:00
90debb9ff2 Keep even the first of the special tokens intact while lowercasing. 2019-12-20 11:29:43 -05:00
b98ff88544 Added pipelines quick tour in README 2019-12-20 15:52:50 +01:00
3a2c4e6f63 Merge pull request #1548 from huggingface/cli
[2.2] - Command-line interface - Pipeline class
2019-12-20 15:28:29 +01:00
4e3f745ba4 add example for Model2Model in quickstart 2019-12-20 09:12:31 -05:00
db0795b5d0 defaults models for tf and pt - update tests 2019-12-20 15:07:00 +01:00
7f74084528 Fix leading axis added when saving through the command run 2019-12-20 14:47:04 +01:00
c37815f130 clean up PT <=> TF 2.0 conversion and config loading 2019-12-20 14:35:40 +01:00
73fcebf7ec update serving command 2019-12-20 13:47:35 +01:00
59941c5d1f Merge pull request #2189 from stefan-it/xlmr
Add support for XLM-RoBERTa
2019-12-20 13:26:38 +01:00
15dda5ea32 remove python 2 tests for circle-ci cc @aaugustin @julien-c @LysandreJik 2019-12-20 13:20:41 +01:00
01ffc65e9b update tests to remove unittest.patch 2019-12-20 13:16:23 +01:00
825697cad4 fix tests 2019-12-20 12:51:10 +01:00
1fa93ca1ea Clean up framework handling 2019-12-20 12:34:19 +01:00
ca6bdb28f6 fix pipelines and rename model_card => modelcard 2019-12-20 12:10:40 +01:00
61d9ee45e3 All tests are green. 2019-12-20 11:47:56 +01:00
ff36e6d8d7 Merge pull request #2231 from huggingface/requests_user_agent
[http] customizable requests user-agent
2019-12-20 10:28:10 +01:00
e516a34a15 Use BasicTokenizer to split over whitespaces. 2019-12-20 09:38:08 +01:00
9d0d1cd339 Filter out entity for NER task. 2019-12-20 09:30:37 +01:00
15d897ff4a [http] customizable requests user-agent 2019-12-19 18:29:22 -05:00
f25e9b6f77 [hf_bucket_url] support for cloudfront urls 2019-12-19 18:28:17 -05:00
a5a06a851e [doc] Param name consistency 2019-12-19 16:24:20 -05:00
1718fb9e74 Minor/basic text fixes (#2229)
* Small clarification

Matches line 431 to line 435 for additional clarity and consistency.

* Fixed minor typo

The letter "s" was previously omitted from the word "docstrings".
2019-12-19 16:23:18 -05:00
9a399ead25 Revert incorrect #1778 2019-12-19 15:45:48 -05:00
3376adc051 configuration/modeling/tokenization: add various fine-tuned XLM-RoBERTa models for English, German, Spanish and Dutch (CoNLL datasets) 2019-12-19 21:30:23 +01:00
e4baa68ddb tick-tock cc @julien-c 2019-12-19 20:37:26 +01:00
149dc376aa fix tests 2019-12-19 20:34:28 +01:00
407093b3fa Merge branch 'cli' of https://github.com/huggingface/transformers into cli 2019-12-19 20:26:51 +01:00
c7be096c39 Merge branch 'master' into cli 2019-12-19 20:26:08 +01:00
a305067f2d Removed __main__ 2019-12-19 19:41:48 +01:00
3492a6ec17 Addressing Thom's comments. 2019-12-19 19:06:44 +01:00
33adab2b91 Fix albert example 2019-12-19 12:40:43 -05:00
a1f1dce0ae Correct max position for SQUAD and TFDS 2019-12-19 12:25:55 -05:00
62c1fc3c1e Removed duplicate XLMConfig, XLMForQuestionAnswering and XLMTokenizer from import statement of run_squad.py script 2019-12-19 09:50:56 -05:00
284572efc0 Updated typo on the link
Updated documentation due to typo
2019-12-19 09:36:43 -05:00
ed6ba93912 corrected typo in example for t5 model input argument 2019-12-19 09:34:55 -05:00
81a911cce5 Doc, doc, ... doc. 2019-12-19 15:12:06 +01:00
faef6f6191 Fix logic order for USE_TF/USE_TORCH 2019-12-19 12:28:17 +01:00
5664327c24 Hide train command for now. 2019-12-19 12:27:54 +01:00
3b29322d4c Expose all the pipeline argument on serve command. 2019-12-19 12:24:17 +01:00
fc624716aa Renaming framework env variables flags from NO_ to USE_ 2019-12-19 11:49:06 +01:00
f516cf3956 Allow pipeline to write output in binary format 2019-12-19 11:42:33 +01:00
d72fa2a0f6 Fix inputs_for_model call in QuestionAnsweringPipeline accessing __dict__ on list. 2019-12-19 10:54:10 +01:00
bcc99fd92e Fix wrong automatic config allocation through AutoConfig 2019-12-19 10:32:21 +01:00
a26ce4dee1 examples: add XLM-RoBERTa to glue script 2019-12-19 02:23:01 +01:00
ec5d6c6a70 Adressing issue with NER task omitting first and last word. 2019-12-19 00:12:10 +01:00
fe9aab1055 tokenization: use S3 location for XLM-RoBERTa model 2019-12-18 23:47:48 +01:00
5c5f67a256 modeling: use S3 location for XLM-RoBERTa model 2019-12-18 23:47:00 +01:00
db90e12114 configuration: use S3 location for XLM-RoBERTa model 2019-12-18 23:46:33 +01:00
d0724d0794 Add PipedPipelineDataFormat 2019-12-18 23:27:26 +01:00
7711403bbd Expose config through the cli arguments 2019-12-18 22:59:51 +01:00
8bb166db5d Expose more information in the output of TextClassificationPipeline 2019-12-18 22:53:19 +01:00
f09d999641 docs: fix numbering 😅 2019-12-18 19:49:33 +01:00
dd7a958fd6 docs: add XLM-RoBERTa to pretrained model list (incl. all parameters) 2019-12-18 19:45:46 +01:00
d35405b7a3 docs: add XLM-RoBERTa to index page 2019-12-18 19:45:10 +01:00
3e89fca543 readme: add XLM-RoBERTa to model architecture list 2019-12-18 19:44:23 +01:00
128cfdee9b tokenization add XLM-RoBERTa base model 2019-12-18 19:28:16 +01:00
e778dd854d modeling: add XLM-RoBERTa base model 2019-12-18 19:27:34 +01:00
04b602f96f Put module import on top of the module. 2019-12-18 18:28:39 +01:00
64a971a915 auto: add XLM-RoBERTa to auto tokenization 2019-12-18 18:24:32 +01:00
036831e279 auto: add XLM-RoBERTa to audo modeling 2019-12-18 18:23:42 +01:00
41a13a6375 auto: add XLMRoBERTa to auto configuration 2019-12-18 18:20:27 +01:00
0c88c856d5 Unnest QuestionAnsweringArgumentHandler 2019-12-18 18:18:16 +01:00
8efc6dd544 fix #2214 2019-12-18 10:47:59 -05:00
a2978465a2 Merge branch 'master' into patch-1 2019-12-18 14:54:46 +00:00
01b68be34f converter: remove XLM-RoBERTa specific script (can be done with the script for RoBERTa now) 2019-12-18 12:24:46 +01:00
3d2096f516 further cleanup 2019-12-18 11:50:54 +01:00
ca31abc6d6 tokenization: *align* fairseq and spm vocab to fix some tokenization errors 2019-12-18 11:36:54 +01:00
8e5587fb79 few fixes on sampling 2019-12-18 11:32:37 +01:00
cce3089b65 Merge remote-tracking branch 'upstream/master' into xlmr 2019-12-18 11:05:16 +01:00
641a8decdc clean up code and add arbitrary number of return sequences 2019-12-18 10:43:48 +01:00
e347725d8c More fine-grained control over pipeline creation with config argument. 2019-12-18 10:41:24 +01:00
94c99db34c [FinBERT] fix incorrect url 2019-12-17 20:35:25 -05:00
7ffa817390 [s3] mv files and update links 2019-12-17 20:35:25 -05:00
c5f35e61db Uploaded files to AWS. 2019-12-17 20:35:25 -05:00
abc43ffbff Add pretrained model documentation for FinBERT. 2019-12-17 20:35:25 -05:00
8ac840ff87 Adding Finnish BERT. 2019-12-17 20:35:25 -05:00
a0d386455b Fix outdated tokenizer doc 2019-12-17 20:07:39 -05:00
ea636440d1 [roberta.conversion] Do not hardcode vocab size
and support for fairseq 0.9+
2019-12-17 18:12:22 -05:00
a4df2e0113 update roberta conversion
- update to fix conversion for the updated fairseq model
- create save directory if not exist
2019-12-17 18:12:22 -05:00
77d397202b clean up dead code 2019-12-17 23:28:46 +01:00
bbc0c86f9b beam search + single beam decoding 2019-12-17 23:27:02 +01:00
5e289f69bc regex 2019.12.17 install fails with Python 2 2019-12-17 15:54:05 -05:00
2cff4bd8f3 Fix segmentation fault 2019-12-17 15:54:05 -05:00
55397dfb9b CsvPipelineDataFormat: Fix for single-column 2019-12-17 13:10:51 -05:00
b6938916ac adding beam search 2019-12-17 17:23:36 +01:00
d303f84e7b fix: wrong architecture count in README
Just say “the following” so that this intro doesn't so easily fall out of date :) )
2019-12-17 16:18:00 +00:00
2fde5a2489 Initial bunch of documentation. 2019-12-17 12:16:07 +01:00
2f1c745cde update conversion script 2019-12-17 11:47:54 +01:00
83bc5235cf Merge branch 'master' into pr/2189 2019-12-17 11:47:32 +01:00
d7c62661a3 Provide serving dependencies for tensorflow and pytorch (serving-tf, serving-torch) 2019-12-17 11:23:39 +01:00
f349826a57 model: fix cls and sep token for XLM-RoBERTa documentation 2019-12-17 10:36:04 +01:00
f061606277 Merge pull request #2164 from huggingface/cleanup-configs
[SMALL BREAKING CHANGE] Cleaning up configuration classes - Adding Model Cards
2019-12-17 09:10:16 +01:00
805c21aeba tried to fix the failed checks 2019-12-17 11:36:00 +08:00
d000195ee6 add comment for example_index and unique_id in single process 2019-12-17 11:28:34 +08:00
3c6efd0ca3 updated usage example in modeling_roberta for question and answering 2019-12-17 11:18:12 +08:00
3f5ccb183e [doc] Clarify uploads
cf 855ff0e91d (commitcomment-36452545)
2019-12-16 18:20:29 -05:00
3cb51299c3 Fix #2109 2019-12-16 16:58:44 -05:00
18a879f475 fix #2180 2019-12-16 16:44:29 -05:00
d803409215 Fix run squad evaluate during training 2019-12-16 16:31:38 -05:00
a468870fd2 refactoring generation 2019-12-16 22:22:30 +01:00
855ff0e91d [doc] Model upload and sharing
ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?
2019-12-16 12:42:22 -05:00
d064009b72 converter: fix vocab size 2019-12-16 17:23:25 +01:00
a701a0cee1 configuration: fix model name for large XLM-RoBERTa model 2019-12-16 17:17:56 +01:00
59a1aefb1c tokenization: add support for new XLM-RoBERTa model. Add wrapper around fairseq tokenization logic 2019-12-16 17:00:55 +01:00
69f4f058fa model: add support for new XLM-RoBERTa model 2019-12-16 17:00:12 +01:00
a648ff738c configuration: add support for XLM-RoBERTa model 2019-12-16 16:47:39 +01:00
9ed09cb4a3 converter: add conversion script for original XLM-RoBERTa weights to Transformers-compatible weights 2019-12-16 16:46:58 +01:00
d3549b66af module: add support for XLM-RoBERTa (__init__) 2019-12-16 16:38:39 +01:00
a096e2a88b WIP serving through HTTP internally using pipelines. 2019-12-16 16:38:02 +01:00
71b4750517 examples: add support for XLM-RoBERTa to run_ner script 2019-12-16 16:37:27 +01:00
43a4e1bbe4 Adressing issue in varargs handling for question answering. 2019-12-16 16:00:41 +01:00
46ccbb42fc Make CLI run command use integer mapping for device argument. 2019-12-16 15:49:41 +01:00
bbc707cf39 Fix non-keyworded varargs handling in DefaultArgumentHandler for pipeline. 2019-12-16 15:49:09 +01:00
9c391277cc Allow tensors placement on specific device through CLI and pipeline. 2019-12-16 15:19:13 +01:00
1bbdbacd5b update __init__ and saving 2019-12-16 14:38:20 +01:00
955d7ecb57 Refactored Pipeline with dedicated argument handler. 2019-12-16 14:34:54 +01:00
031ad4eb37 improving JSON error messages (for model card and configurations) 2019-12-16 14:20:57 +01:00
db0a9ee6e0 adding albert to TF auto models cc @LysandreJik 2019-12-16 14:08:08 +01:00
a4d07b983a dict of all config and model files cc @LysandreJik 2019-12-16 14:00:32 +01:00
d3418a94ff update tests 2019-12-16 13:52:41 +01:00
56e98ba81a add model cards cc @mfuntowicz 2019-12-16 11:07:27 +01:00
8669598abd update t5 tf 2019-12-16 09:59:36 +01:00
1b8613acb3 updating t5 config class 2019-12-16 09:51:42 +01:00
8e3b1c860f Added FeatureExtraction pipeline. 2019-12-15 01:37:52 +01:00
f1971bf303 Binding pipelines to the cli. 2019-12-15 01:37:16 +01:00
cc0135134b :zip: #2106 basic tokenizer.tokenize global speed improvement (3-8x) by simply caching added_tokens in a Set 2019-12-14 15:25:13 +01:00
dc667ce1a7 double check cc @LysandreJik 2019-12-14 09:56:27 +01:00
7140363e09 update bertabs 2019-12-14 09:44:53 +01:00
a52d56c8d9 Merge branch 'master' into cleanup-configs 2019-12-14 09:43:07 +01:00
e92bcb7eb6 Merge pull request #1739 from huggingface/t5
[WIP] Adding Google T5 model
2019-12-14 09:40:43 +01:00
cbb368ca06 distilbert tests 2019-12-14 09:31:18 +01:00
b6d4284b26 [cli] Uploads: fix + test edge case 2019-12-13 22:44:57 -05:00
a1faaf9962 deleted useless file 2019-12-14 08:57:13 +08:00
c7780700f5 Merge branch 'refs/heads/squad_roberta'
# Conflicts:
#	transformers/data/processors/squad.py
2019-12-14 08:53:59 +08:00
76f0d99f02 Merge remote-tracking branch 'refs/remotes/huggingface/master' 2019-12-14 08:45:17 +08:00
8e9526b4b5 add multiple processing 2019-12-14 08:43:58 +08:00
7bd11dda6f Release: v2.2.2 2019-12-13 16:45:30 -05:00
c3248cf122 Tests for all tokenizers 2019-12-13 16:41:44 -05:00
f2ac50cb55 better for python2.x 2019-12-13 16:41:44 -05:00
4cbdc7d910 missed space 2019-12-13 16:41:44 -05:00
dd2add9f6e more tests 2019-12-13 16:41:44 -05:00
df160af736 🐛 #2096 in tokenizer.decode, space is not joined between all subtexts instead of before added tokens 2019-12-13 16:41:44 -05:00
5b7b78e088 🐛 #2096 in tokenizer.decode, adds a space after special tokens to return right formatted string 2019-12-13 16:41:44 -05:00
866d73ca26 [cli] Upload is now compatible with folders 2019-12-13 16:39:08 -05:00
d461472948 return for SQuAD [BLACKED] 2019-12-13 15:31:52 -05:00
f24a228a93 Speed up tokenization process 2019-12-13 14:50:35 -05:00
c8ed1c82c8 [SQUAD] Load checkpoint when evaluating without training 2019-12-13 12:13:48 -05:00
5c00e344c1 update model doc - swith 3B/11B to 3b/11b 2019-12-13 16:33:29 +01:00
0b51532ce9 Reintroducing the batch_encode_plus method 2019-12-13 16:22:50 +01:00
110394b2ba Merge branch 'master' into t5 2019-12-13 16:03:32 +01:00
5a5c4349e8 Fix summarization to_cpu doc 2019-12-13 10:02:33 -05:00
8ade204098 fix tf 2019-12-13 14:48:47 +01:00
47f0e3cfb7 cleaning up configuration classes 2019-12-13 14:33:24 +01:00
8938b546bf Removed from_config 2019-12-13 14:27:04 +01:00
1ca52567a4 Allow model conversion in the pipeline allocator. 2019-12-13 14:13:14 +01:00
28e64ad5a4 Raise an exception if the pipeline allocator can't determine the tokenizer from the model. 2019-12-13 14:12:54 +01:00
be5bf7b81b Added NER pipeline. 2019-12-13 14:12:17 +01:00
80eacb8f16 Adding labels mapping for classification models in their respective config. 2019-12-13 14:10:22 +01:00
33e72b08d5 fix inner dimensions for 3B/11B models 2019-12-13 11:33:05 +01:00
9b312f9d41 initial version for roberta squad 2019-12-13 14:51:40 +08:00
40ed717232 Merge remote-tracking branch 'refs/remotes/huggingface/master' 2019-12-13 09:10:17 +08:00
7296f1010b Cleanup squad and add allow train_file and predict_file usage 2019-12-12 13:01:04 -05:00
5d67aa21ae [doc] Replicate doc from #2144 2019-12-12 12:39:41 -05:00
3fd71c4431 Update example scripts 2019-12-12 12:08:54 -05:00
fe92755b99 Fix special tokens mask in encode 2019-12-12 11:37:19 -05:00
fbf5455a86 Fix typo in examples/run_glue.py args declaration.
deay -> decay
2019-12-12 11:16:19 -05:00
f19dad61c7 fixing XLM conversion tests with dummy input 2019-12-12 14:46:30 +01:00
f69dbecc38 Expose classification labels mapping (and reverse) in model config. 2019-12-12 10:25:36 +01:00
90df44f0aa Merge pull request #2063 from guillaume-be/special_tokens_mask_value_not_used
special_tokens_mask value was unused and calculated twice
2019-12-12 08:21:46 +01:00
707f9e9241 Merge pull request #2081 from pglock/patch-1
handle string with only whitespaces as empty
2019-12-12 08:20:43 +01:00
137e20a846 Merge pull request #2075 from huggingface/check-link-validity
Check link validity
2019-12-12 08:09:12 +01:00
d5712f7cac Merge branch 'master' into check-link-validity 2019-12-12 08:00:51 +01:00
9c58b236ef Merge pull request #2144 from huggingface/from-pretrained-from-url
Allowing from_pretrained to load from url directly
2019-12-12 07:43:40 +01:00
413f41921b fix merge 2019-12-12 07:34:42 +01:00
386a93f0f8 Merge branch 'master' into from-pretrained-from-url 2019-12-12 07:31:05 +01:00
2d103546ef Merge pull request #2148 from huggingface/fix_encode_plus
Fix encode plus
2019-12-12 07:24:47 +01:00
1748fdf657 [doc] Fix rst table 2019-12-11 18:32:27 -05:00
36fc52a3b4 Update links to weights 2019-12-11 18:32:27 -05:00
371c5ddfad Py2 tests for Lysandre 2019-12-11 18:32:27 -05:00
5505cf7014 Run tests on Py2 too, for Lysandre 2019-12-11 18:32:27 -05:00
9cb97c0c0f Actually run the tests 2019-12-11 18:32:27 -05:00
95854c4a2f Actually run the tests 2019-12-11 18:32:27 -05:00
d2100428d3 Update to new test infra and only run conditionally 2019-12-11 18:32:27 -05:00
597ba7feb3 Support testing Japanese BERT tokenizers 2019-12-11 18:32:27 -05:00
6a43dc9d7d Support Python 2 2019-12-11 18:32:27 -05:00
a09da4eeb0 Add a test for Japanese BERT tokenizers 2019-12-11 18:32:27 -05:00
57b5cb3eaa Fix loading BertJapaneseTokenizer 2019-12-11 18:32:27 -05:00
c03c0dfd23 Add support for Japanese BERT models by cl-tohoku 2019-12-11 18:32:27 -05:00
4f15e5a267 Add tests.
Maybe not the best possible place for the tests, lmk.
2019-12-11 17:41:51 -05:00
18e1f751f1 TF support 2019-12-11 17:07:46 -05:00
31e5b5ff22 Fix tests + first example of doc 2019-12-11 15:22:02 -05:00
3d57c51111 Fix encode plus 2019-12-11 15:10:17 -05:00
c999a3e505 Allow from_pretrained to take a remote identifier 2019-12-11 12:29:58 -05:00
030faccb8d doc: fix pretrained models table 2019-12-11 12:19:21 -05:00
6709739a05 allowing from_pretrained to load from url directly 2019-12-11 18:15:45 +01:00
29570db25b allowing from_pretrained to load from url directly 2019-12-11 17:19:18 +01:00
2e2f9fed55 rm duplicate imports 2019-12-11 11:11:56 -05:00
c28273793e Add missing DistilBert and Roberta to AutoModelForTokenClassification 2019-12-11 15:31:45 +01:00
4c12860f7a Remove misleading documentation 2019-12-11 09:22:37 -05:00
b040bff6df Added supported model to AutoModelTokenClassification 2019-12-11 14:13:58 +01:00
fafd4c86ec fix TF 2.0 version of T5 - update conversion script 2019-12-11 13:47:27 +01:00
6aa919469d Update run_xnli to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
89896fe04f Update run_ner to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
fdc05cd68f Update run_squad to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
854ec5784e Update run_glue to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:30:36 -06:00
9a24e0cf76 Refactored qa pipeline argument handling + unittests 2019-12-11 00:33:25 +01:00
b72f9d340e Correct index in script 2019-12-10 18:33:17 -05:00
51ae203290 Merge pull request #2129 from leopd/master
Progress indicator improvements when downloading pre-trained models.
2019-12-10 22:18:55 +01:00
ec6fb25c21 Patch documentation 2019-12-10 15:49:20 -05:00
418589244d Uniforming the ignored indices 2019-12-10 15:26:19 -05:00
58d75aa310 Progress indicator improvements when downloading pre-trained models. 2019-12-10 11:36:56 -08:00
6a73382706 Complete warning + cleanup 2019-12-10 14:33:24 -05:00
dc4e9e5cb3 DataParallel for SQuAD + fix XLM 2019-12-10 19:21:20 +00:00
67a8be8e90 fix backward in tests 2019-12-10 17:50:32 +01:00
07bc8efbc3 add greedy decoding and sampling 2019-12-10 17:27:50 +01:00
63e36007ee Make sure padding, cls and another non-context tokens cannot appear in the answer. 2019-12-10 16:47:35 +01:00
f2538c1274 all tests in torch no grad 2019-12-10 16:33:11 +01:00
a5df980c5b updating distilbert test 2019-12-10 16:01:15 +01:00
40a39ab650 Reuse recent SQuAD refactored data structure inside QA pipelines. 2019-12-10 15:59:38 +01:00
7c3a15ace9 Merge branch 'master' into t5 2019-12-10 15:36:54 +01:00
981a5c8c17 updating models urls 2019-12-10 15:36:19 +01:00
e6cff60b4c Merge pull request #2069 from huggingface/cleaner-pt-tf-conversion
clean up PT <=> TF conversion
2019-12-10 15:34:08 +01:00
4b82c485de remove misplaced summarization documentation 2019-12-10 09:13:33 -05:00
8ae1044f80 updating tests and TF 2.0 model 2019-12-10 15:11:07 +01:00
aae74065df Added QuestionAnsweringPipeline unit tests. 2019-12-10 13:37:20 +01:00
a7d3794a29 Remove token_type_ids for compatibility with DistilBert 2019-12-10 13:37:20 +01:00
fe0f552e00 Use attention_mask everywhere. 2019-12-10 13:37:20 +01:00
348e19aa21 Expose attention_masks and input_lengths arguments to batch_encode_plus 2019-12-10 13:37:18 +01:00
c2407fdd88 Enable the Tensorflow backend. 2019-12-10 13:37:14 +01:00
f116cf599c Allow hidding frameworks through environment variables (NO_TF, NO_TORCH). 2019-12-10 13:37:07 +01:00
6e61e06051 batch_encode_plus generates the encoder_attention_mask to avoid attending over padded values. 2019-12-10 13:37:07 +01:00
02110485b0 Added batching, topk, chars index and scores. 2019-12-10 13:36:55 +01:00
e1d89cb24d Added QuestionAnsweringPipeline with batch support. 2019-12-10 13:36:55 +01:00
0558c9cb9b Merge branch 'master' into t5 2019-12-10 12:58:48 +01:00
81babb227e Added download command through the cli.
It allows to predownload models and tokenizers.
2019-12-10 12:18:59 +01:00
31a3a73ee3 updating CLI 2019-12-10 12:18:59 +01:00
7c1697562a compatibility with sklearn and keras 2019-12-10 12:12:22 +01:00
b81ab431f2 updating AutoModels and AutoConfiguration - adding pipelines 2019-12-10 12:11:33 +01:00
2d8559731a add pipeline - train 2019-12-10 11:34:16 +01:00
72c36b9ea2 [WIP] - CLI 2019-12-10 11:33:14 +01:00
e57d00ee10 Merge pull request #1984 from huggingface/squad-refactor
[WIP] Squad refactor
2019-12-10 11:07:26 +01:00
ecabbf6d28 Merge pull request #2107 from huggingface/encoder-mask-shape
create encoder attention mask from shape of hidden states
2019-12-10 10:07:56 +01:00
608a8f5b56 updating tf 2.0 layer_norm to T5 layer norm 2019-12-10 10:01:01 +01:00
df3961121f Add MMBT Model to Transformers Repo 2019-12-09 18:36:48 -08:00
1d18930462 Harmonize no_cuda flag with other scripts 2019-12-09 20:37:55 -05:00
f7eba09007 clean for release 2019-12-09 20:37:55 -05:00
2a64107e44 improve device usage 2019-12-09 20:37:55 -05:00
c0707a85d2 add README 2019-12-09 20:37:55 -05:00
ade3cdf5ad integrate ROUGE 2019-12-09 20:37:55 -05:00
076602bdc4 prevent BERT weights from being downloaded twice 2019-12-09 20:37:55 -05:00
5909f71028 add py-rouge dependency 2019-12-09 20:37:55 -05:00
a1994a71ee simplified model and configuration 2019-12-09 20:37:55 -05:00
3a9a9f7861 default output dir to documents dir 2019-12-09 20:37:55 -05:00
693606a75c update the docs 2019-12-09 20:37:55 -05:00
c0443df593 remove beam search 2019-12-09 20:37:55 -05:00
2403a66598 give transformers API to BertAbs 2019-12-09 20:37:55 -05:00
4d18199902 cast bool tensor to long for pytorch < 1.3 2019-12-09 20:37:55 -05:00
9f75565ea8 setup training 2019-12-09 20:37:55 -05:00
4735c2af07 tweaks to the BeamSearch API 2019-12-09 20:37:55 -05:00
ba089c780b share pretrained embeddings 2019-12-09 20:37:55 -05:00
9660ba1cbd Add beam search 2019-12-09 20:37:55 -05:00
1c71ecc880 load the pretrained weights for encoder-decoder
We currently save the pretrained_weights of the encoder and decoder in
two separate directories `encoder` and `decoder`. However, for the
`from_pretrained` function to operate with automodels we need to
specify the type of model in the path to the weights.

The path to the encoder/decoder weights is handled by the
`PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice
there is no easy way to infer the type of model that was initialized for
the encoder and decoder we add a parameter `model_type` to the function.
This is not an ideal solution as it is error prone, and the model type
should be carried by the Model classes somehow.

This is a temporary fix that should be changed before merging.
2019-12-09 20:37:55 -05:00
07f4cd73f6 update function to add special tokens
Since I started my PR the `add_special_token_single_sequence` function
has been deprecated for another; I replaced it with the new function.
2019-12-09 20:37:55 -05:00
5c877fe94a fix albert links 2019-12-09 18:53:00 -05:00
79526f82f5 Remove unnecessary epoch variable 2019-12-09 16:24:35 -05:00
9626e0458c Add functionality to continue training from last saved global_step 2019-12-09 16:24:35 -05:00
2d73591a18 Stop saving current epoch 2019-12-09 16:24:35 -05:00
0eb973b0d9 Use saved optimizer and scheduler states if available 2019-12-09 16:24:35 -05:00
a03fcf570d Save tokenizer after each epoch to be able to resume training from a checkpoint 2019-12-09 16:24:35 -05:00
f71b1bb05a Save optimizer state, scheduler state and current epoch 2019-12-09 16:24:35 -05:00
8e651f56b7 fix tf tests 2019-12-09 22:13:57 +01:00
808bb8da7e fix transfo xl tests 2019-12-09 21:48:34 +01:00
b016dd16c9 fix tests on python 3.5 2019-12-09 21:38:07 +01:00
2a4ef098d6 Add ALBERT and XLM to SQuAD script 2019-12-09 10:46:47 -05:00
00c4e39581 Merge branch 'master' into squad-refactor 2019-12-09 10:41:15 -05:00
169fea6855 updating T5 2019-12-09 16:25:33 +01:00
3520be7824 create encoder attention mask from shape of hidden states
We currently create encoder attention masks (when they're not provided)
based on the shape of the inputs to the encoder. This is obviously
wrong; sequences can be of different lengths. We now create the encoder
attention mask based on the batch_size and sequence_length of the
encoder hidden states.
2019-12-09 11:19:45 +01:00
0cb163865a Remove pytest dependency. (#2093) 2019-12-07 07:46:14 -05:00
2670b0d682 Fix bug which lowercases special tokens 2019-12-06 16:15:53 -05:00
35401fe50f Remove dependency on pytest for running tests (#2055)
* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda
2019-12-06 13:57:38 -05:00
e4679cddce [cli] Uploads: add progress bar (#2078)
* [cli] Uploads: add progress bar

see https://github.com/huggingface/transformers/pull/2044#discussion_r354057827 for context

* rename + documentation

* Add auto-referential comment
2019-12-06 11:56:23 -05:00
1d87b37d10 updating 2019-12-06 15:30:09 +01:00
4cb9b60558 Merge pull request #2077 from patrickvonplaten/change_documentation_for_past_output_shape
corrected documentation for past tensor shape for ctrl and gpt2 model
2019-12-06 12:14:48 +01:00
5482822a2b Merge pull request #2046 from jplu/tf2-ner-example
Add NER TF2 example.
2019-12-06 12:12:22 +01:00
fc1bb1f867 Merge pull request #2068 from huggingface/fix-2042
Nicer error message when Bert's input is missing batch size
2019-12-06 12:06:42 +01:00
21451ec6ba handle string with only whitespaces as empty 2019-12-06 10:32:43 +01:00
f230d91b43 check the validity of links
We add a script and a CI workflow to check that all download links
present in the source code are valid.
2019-12-06 09:41:28 +01:00
d0383e4daf corrected documentation for past tensor shape for ctrl and gpt2 model 2019-12-06 01:24:22 +01:00
e9217da5ff Cleanup
Improve global visibility on the run_squad script, remove unused files and fixes related to XLNet.
2019-12-05 16:01:51 -05:00
9ecd83dace Patch evaluation for impossible values + cleanup 2019-12-05 14:44:57 -05:00
35ff345fc9 update requirements 2019-12-05 12:07:04 -05:00
552c44a9b1 release distilm-bert 2019-12-05 10:14:58 -05:00
ee53de7aac Pr for pplm (#2060)
* license

* changes

* ok

* Update paper link and commands to run

* pointer to uber repo
2019-12-05 09:20:07 -05:00
f8fb4335c9 clean up a little bit PT <=> TF conversion 2019-12-05 15:19:32 +01:00
bebaa14039 Merge pull request #2045 from aaugustin/remove-dead-code
Remove dead code in tests.
2019-12-05 14:41:56 +01:00
18fb93530b fixing #2042 - Nicer error message 2019-12-05 14:36:34 +01:00
2d5d86e037 fix #2031 2019-12-05 14:06:29 +01:00
af077b15e2 Merge pull request #2065 from huggingface/fixing-camembert
Fixing camembert tokenization
2019-12-05 13:45:44 +01:00
3268ebd229 fix xlnet test 2019-12-05 13:35:29 +01:00
6c5297a423 Fixing camembert tokenization 2019-12-05 13:27:58 +01:00
9200a759d7 Add few tests on the TF optimization file with some info in the documentation. Complete the README. 2019-12-05 12:56:43 +01:00
1f179f095f Merge pull request #2011 from AdityaSoni19031997/patch-1
typo fix on the docs as per Pytorch v1.1+
2019-12-05 12:39:04 +01:00
1eaf44e713 Merge pull request #2007 from roskoN/xlnet_attention_fix
fixed XLNet attention output for both attention streams whenever target_mapping is provided
2019-12-05 12:32:39 +01:00
71e4693f08 fix #1968 2019-12-05 12:14:24 +01:00
f9f395b21c Merge pull request #1735 from ondewo/tf-do-not-use-gpu-on-import
Do not use GPU when importing transformers
2019-12-05 11:56:48 +01:00
75a97af6bc fix #1450 - add doc 2019-12-05 11:26:55 +01:00
8b388827b5 fix #1920 2019-12-05 11:18:43 +01:00
d425a4d60b Merge pull request #1870 from alexzubiaga/xlnet-for-token-classification
XLNet for Token classification
2019-12-05 09:54:09 +01:00
1eb89ddf73 Merge pull request #2044 from huggingface/cli_upload
CLI for authenticated file sharing
2019-12-05 09:44:07 +01:00
7f998b1b83 special_tokens_mask value was unused and calculated twice 2019-12-05 09:01:39 +01:00
fb0d2f1da1 preparing release distil-mBERT 2019-12-05 03:00:16 -05:00
3ba417e1a8 [cli] ls: Tabular formatting 2019-12-04 18:40:52 -05:00
ce158a076f Return dataset (pytorch) 2019-12-04 17:55:52 -05:00
7a03519975 Documentation 2019-12-04 17:24:35 -05:00
96fa9a8a70 Python 2 + Post mime-type to S3 2019-12-04 17:22:50 -05:00
33508ae310 Remove only_first 2019-12-04 16:26:45 -05:00
f7e4a7cdfa Cleanup 2019-12-04 16:24:15 -05:00
a7ca6d738b Padding side is tokenizer-dependant 2019-12-04 15:43:34 -05:00
cca75e7884 Kill the demon spawn 2019-12-04 15:42:29 -05:00
bf119c0568 TFDS dataset can now be evaluated 2019-12-04 11:34:59 -05:00
ff98b041da Fix whitespace issue 2019-12-04 16:53:06 +01:00
9ddc3f1a12 Naming update + XLNet/XLM evaluation 2019-12-04 10:37:00 -05:00
5bfcd0485e fix #1991 2019-12-04 14:53:11 +01:00
cae641ff26 Merge pull request #1846 from tamuhey/patch/iss1845
fix summary_type value of SequenceSummary
2019-12-04 13:28:39 +01:00
254ebb979c Bugfix on init file. Missing comma. 2019-12-04 10:00:25 +01:00
ecb923da9c Create a NER example similar to the Pytorch one. It takes the same options, and can be run the same way. 2019-12-04 09:43:15 +01:00
40255ab002 Remove dead code in tests. 2019-12-04 08:21:02 +01:00
e4fbf3e2cc CLI for authenticated file sharing 2019-12-04 00:52:23 -05:00
de276de1c1 Working evaluation 2019-12-03 17:15:51 -05:00
7edb51f3a5 [pplm] split classif head into its own file 2019-12-03 22:07:25 +00:00
c835bc85c2 Compute predictions 2019-12-03 15:28:16 -05:00
285b1241e3 Added SquadResult 2019-12-03 15:00:49 -05:00
8101924a68 Patch: v2.2.1 2019-12-03 11:20:26 -05:00
48cbf267c9 Use full dataset for eval (SequentialSampler in Distributed setting) 2019-12-03 11:01:37 -05:00
f434bfc623 [pplm] Update S3 links
Co-Authored-By: Piero Molino <w4nderlust@gmail.com>
2019-12-03 10:53:02 -05:00
96e83506d1 Always use SequentialSampler during evaluation
When evaluating, shouldn't we always use the SequentialSampler instead of DistributedSampler? Evaluation only runs on 1 GPU no matter what, so if you use the DistributedSampler with N GPUs, I think you'll only evaluate on 1/N of the evaluation set. That's at least what I'm finding when I run an older/modified version of this repo.
2019-12-03 10:15:39 -05:00
3b48806f75 [pplm] README: add setup + tweaks 2019-12-03 10:14:02 -05:00
0cb2c90890 readme
Co-Authored-By: Rosanne Liu <mimosavvy@gmail.com>
2019-12-03 10:14:02 -05:00
1efb2ae7fc [pplm] move scripts under examples/pplm/ 2019-12-03 10:14:02 -05:00
a59fdd1627 generate_text_pplm now works with batch_size > 1 2019-12-03 10:14:02 -05:00
893d0d64fe Changed order of some parameters to be more consistent. Identical results. 2019-12-03 10:14:02 -05:00
f42816e7fc Added additional check for url and path in discriminator model params 2019-12-03 10:14:02 -05:00
f10b925015 Imrpovements: model_path renamed pretrained_model, tokenizer loaded from pretrained_model, pretrained_model set to discriminator's when discrim is specified, sample = False by default but cli parameter introduced. To obtain identical samples call the cli with --sample 2019-12-03 10:14:02 -05:00
75904dae66 Removed global variable device 2019-12-03 10:14:02 -05:00
7fd54b55a3 Added support for generic discriminators 2019-12-03 10:14:02 -05:00
b0eaff36e6 Added a +1 to epoch when saving weights 2019-12-03 10:14:02 -05:00
611961ade7 Added tqdm to preprocessing 2019-12-03 10:14:02 -05:00
afc7dcd94d Now run_pplm works on cpu. Identical output as before (when using gpu). 2019-12-03 10:14:02 -05:00
61399e5afe Cleaned perturb_past. Identical output as before. 2019-12-03 10:14:02 -05:00
ffc2935405 Fix for making unditioned generation work. Identical output as before. 2019-12-03 10:14:02 -05:00
9f693a0c48 Cleaned generate_text_pplm. Identical output as before. 2019-12-03 10:14:02 -05:00
61a12f790d Renamed SmallConst to SMALL_CONST and introduced BIG_CONST. Identical output as before. 2019-12-03 10:14:02 -05:00
ef47b2c03a Removed commented code. Identical output as before. 2019-12-03 10:14:02 -05:00
7ea12db3f5 Removed commented code. Identical output as before. 2019-12-03 10:14:02 -05:00
08c6e456a3 Cleaned full_text_generation. Identical output as before. 2019-12-03 10:14:02 -05:00
6c9c131780 More cleanup for run_model. Identical output as before. 2019-12-03 10:14:02 -05:00
7ffe47c888 Improved device specification 2019-12-03 10:14:02 -05:00
4f2164e40e First cleanup step, changing function names and passing parameters all the way through without using args. Identical output as before. 2019-12-03 10:14:02 -05:00
821de121e8 Minor changes 2019-12-03 10:14:02 -05:00
7469d03b1c Fixed minor bug when running training on cuda 2019-12-03 10:14:02 -05:00
0b51fba20b Added script for training a discriminator for pplm to use 2019-12-03 10:14:02 -05:00
34a83faabe Let's make PPLM great again 2019-12-03 10:14:02 -05:00
d5faa74cd6 tokenizer white space: revert to previous behavior 2019-12-03 10:14:02 -05:00
0b77d66a6d rm extraneous import 2019-12-03 10:14:02 -05:00
83b1e6ac9e fix the loss backward issue
(cherry picked from commit 566468cc984c6ec7e10dfc62b5b4191781a99cd2)
2019-12-03 10:14:02 -05:00
572c24cfa2 PPLM (squashed)
Co-authored-by: piero <piero@uber.com>
Co-authored-by: Rosanne Liu <mimosavvy@gmail.com>
2019-12-03 10:14:02 -05:00
f19a78a634 Merge pull request #1903 from valohai/master
Valohai integration
2019-12-03 16:13:01 +01:00
d100ad99c0 Merge pull request #2014 from aaugustin/mark-tf-auto-model-test-as-slow
Mark tests in TFAutoModelTest as slow.
2019-12-03 16:03:48 +01:00
66fc8d25a5 Change ref to original GLUE downloader script 2019-12-03 10:49:50 +02:00
fbaf05bd92 Remove annoying tokenization message 2019-12-02 18:23:00 -05:00
e85855f2c4 Fix ALBERT exports with pretraining + sp classifier; Fix naming for ALBERT TF models 2019-12-02 18:00:19 -05:00
b3d834ae11 Reorganize ALBERT conversion script 2019-12-02 15:01:52 -05:00
f3776df0f3 WIP debugging 2019-12-02 15:47:00 +01:00
5ab93083e4 Mark tests in TFAutoModelTest as slow.
Each test forces downloading the same 536MB file, which is slow
even with a decent internet connection.
2019-12-01 18:25:15 +01:00
c356290c8d typo fix as per Pytorch v1.1+ 2019-12-01 14:08:14 +05:30
76c0bc06d5 [XLNet] Changed post-processing of attention w.r.t to target_mapping
Whenever target_mapping is provided to the input, XLNet outputs two different attention streams.
Based on that the attention output would be on of the two:
- a list of tensors (usual case for most transformers)
- a list of 2-tuples of tensors, one tesor for each of attention streams
Docs and unit-tests have been updated
2019-11-30 21:01:04 +01:00
b90791e950 fixed XLNet attenttion output for both attention streams 2019-11-30 15:57:51 +01:00
b0ee7c7df3 Added Camembert to available models 2019-11-29 14:17:02 -05:00
ecf15ebf3b Add ALBERT to AutoClasses 2019-11-29 11:25:37 -05:00
4a666885b5 reducing my level of enthousiasm 2019-11-29 09:40:50 -05:00
adb5c79ff2 update all tf.shape and tensor.shape to shape_list 2019-11-29 09:40:50 -05:00
2421e54f8c Add link to original source and license to download_glue.data.py 2019-11-29 15:39:28 +02:00
41aa0e8003 Refactor logs and fix loss bug 2019-11-29 15:33:25 +02:00
1ab8dc44b3 Merge pull request #1876 from huggingface/mean-fix
Mean does not exist in TF2
2019-11-29 09:26:33 +01:00
f0d22b6363 Merge pull request #1873 from stefan-it/distilbert-german
German DistilBERT
2019-11-29 09:25:47 +01:00
1e9ac5a7cf New -> normal 2019-11-28 17:43:47 -05:00
0b84b9fd8a Add processors to __init__ 2019-11-28 17:38:52 -05:00
f671997ef7 Interface with TFDS 2019-11-28 17:17:20 -05:00
bd41e8292a Cleanup & Evaluation now works 2019-11-28 16:03:56 -05:00
d49c43ff78 Merge pull request #1778 from eukaryote31/patch-2
from_pretrained: convert DialoGPT format
2019-11-28 16:08:37 +01:00
91caf2462c Merge pull request #1770 from huggingface/initi-encoder-mask
Only init encoder_attention_mask if stack is decoder
2019-11-28 16:06:55 +01:00
49a69d5b78 Merge pull request #1753 from digantamisra98/patch-1
Added Mish Activation Function
2019-11-28 15:24:08 +01:00
96e7ee7238 Merge pull request #1740 from huggingface/fix-ctrl-past
Fix CTRL past
2019-11-27 23:28:30 +01:00
8da47b078d fix merge tests 2019-11-27 23:11:37 +01:00
8c276b9c92 Merge branch 'master' into distilbert-german 2019-11-27 18:11:49 +01:00
3c28a2daac add add_special_tokens=True for input examples 2019-11-27 12:05:23 -05:00
a36f981d1b Merge branch 'master' into fix-ctrl-past 2019-11-27 17:25:46 +01:00
5afca00b47 Merge pull request #1724 from huggingface/fix_encode_plus
Fix encode_plus
2019-11-27 17:14:49 +01:00
49108288ba Merge pull request #1624 from Huawei-MRC-OSI/resumable_http
Add support for resumable downloads for HTTP protocol.
2019-11-27 17:11:07 +01:00
5340d1f21f Merge branch 'master' into resumable_http 2019-11-27 17:10:36 +01:00
10bd1ddb39 soft launch distilbert multilingual 2019-11-27 11:07:22 -05:00
d5478b939d add distilbert + update run_xnli wrt run_glue 2019-11-27 11:07:22 -05:00
07ab8d7af6 fix bug 2019-11-27 11:07:22 -05:00
d474022639 cleaning simple_accuracy since not used anymore 2019-11-27 11:07:22 -05:00
bcd8dc6b48 move xnli_compute_metrics to data/metrics 2019-11-27 11:07:22 -05:00
73fe2e7385 remove fstrings 2019-11-27 11:07:22 -05:00
3e7656f7ac update readme 2019-11-27 11:07:22 -05:00
abd397e954 uniformize w/ the cache_dir update 2019-11-27 11:07:22 -05:00
d75d49a51d add XnliProcessor to doc 2019-11-27 11:07:22 -05:00
d5910b312f move xnli processor (and utils) to transformers/data/processors 2019-11-27 11:07:22 -05:00
289cf4d2b7 change default for XNLI: dev --> test 2019-11-27 11:07:22 -05:00
cb7b77a8a2 fix some typos 2019-11-27 11:07:22 -05:00
84a0b522cf mbert reproducibility results 2019-11-27 11:07:22 -05:00
c4336ecbbd xnli - output_mode consistency 2019-11-27 11:07:22 -05:00
d52e98ff9a add xnli examples/README.md 2019-11-27 11:07:22 -05:00
71f71ddb3e run_xnli + utils_xnli 2019-11-27 11:07:22 -05:00
b5d884d25c Uniformize #1952 2019-11-27 11:05:55 -05:00
7fd1d42a01 Merge pull request #1592 from watkinsm/do_lower_case
Consider do_lower_case in PreTrainedTokenizer
2019-11-27 17:05:18 +01:00
21637d4924 Merge branch 'master' into do_lower_case 2019-11-27 17:04:39 +01:00
de2696f68e suggest to track repo w/ https rather than ssh 2019-11-27 11:02:28 -05:00
88b317739f Fix issue: #1962, input's shape seem to cause error in 2.2.0 version tf_albert_model 2019-11-27 10:38:10 -05:00
45d767297a Updated v2.2.0 doc 2019-11-27 10:12:20 -05:00
361620954a Remove TFBertForPreTraining from ALBERT doc 2019-11-27 10:11:37 -05:00
cc7968227e Updated v2.2.0 doc 2019-11-26 15:52:25 -05:00
ce02550d50 Fix pretrained models table 2019-11-26 15:47:02 -05:00
cf26a0c85e Fix pretrained models table 2019-11-26 15:40:03 -05:00
44b82c777f Updated v2.2.0 doc 2019-11-26 15:15:11 -05:00
ee4647bd5c CamemBERT & ALBERT doc 2019-11-26 15:10:51 -05:00
7c6000e412 Updated v2.2.0 doc 2019-11-26 14:55:29 -05:00
668aac45d2 Pretrained models 2019-11-26 14:52:42 -05:00
8742baa531 Improve test protocol for inputs_embeds in TF 2019-11-26 14:39:47 -05:00
cf62bdc962 Improve test protocol for inputs_embeds in TF
cc @lysandrejik
2019-11-26 14:37:32 -05:00
b632145273 Update master documentation link in README 2019-11-26 14:27:15 -05:00
ae98d45991 Release: v2.2.0 2019-11-26 14:12:44 -05:00
f2f329408d Fix input embeddings 2019-11-26 13:08:12 -05:00
bdfe21ab24 Change param order for consistency 2019-11-26 13:08:12 -05:00
c536c2a480 ALBERT Input Embeds 2019-11-26 13:08:12 -05:00
f873b55e43 Warning for ALBERT-v2 models 2019-11-26 13:08:12 -05:00
c9cb7f8a0f Torch 1.1.0 compatibility + FP16 O1 + TF checkpoints
Co-authored-by: wassname
2019-11-26 13:08:12 -05:00
b18509c208 Tests for ALBERT in TF2 + fixes 2019-11-26 13:08:12 -05:00
7bddbf5961 TFAlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
f6f382532b ALBERT in TF2 2019-11-26 13:08:12 -05:00
d9daad98c7 Re-ordering of group_idx/layer_idx + Python 2 tests 2019-11-26 13:08:12 -05:00
9d5c49546f Tests for AlbertForQuestionAnswering AlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
16263f9685 Headmasking 2019-11-26 13:08:12 -05:00
abb23a78ba Head pruning for ALBERT 2019-11-26 13:08:12 -05:00
4374eaea78 ALBERT for SQuAD 2019-11-26 13:08:12 -05:00
70d99980de ALBERT-V2 2019-11-26 13:08:12 -05:00
c110c41fdb Run GLUE and remove LAMB 2019-11-26 13:08:12 -05:00
6637a77f80 AlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
0d07a23c04 LAMB implementation 2019-11-26 13:08:12 -05:00
c987545592 Converting script 2019-11-26 13:08:12 -05:00
4f3a54bfc8 ALBERT can load pre-trained models. Doesn't inherit from BERT anymore. 2019-11-26 13:08:12 -05:00
c4403006b8 External MLM head 2019-11-26 13:08:12 -05:00
b21402fc86 Python 2 tests + licence 2019-11-26 13:08:12 -05:00
c14a22272f ALBERT passes all tests 2019-11-26 13:08:12 -05:00
870320a24e Early tests 2019-11-26 13:08:12 -05:00
25a31953e8 Output Attentions + output hidden states 2019-11-26 13:08:12 -05:00
ce9eade29c Initializer range using BertPreTrainedModel 2019-11-26 13:08:12 -05:00
5680a11063 Activation function managed from the config file 2019-11-26 13:08:12 -05:00
1e5b31c388 Several fixes and improvements 2019-11-26 13:08:12 -05:00
ee20201d33 Tokenization tests + fixes + init 2019-11-26 13:08:12 -05:00
e3ea5d1d8d Docstrings 2019-11-26 13:08:12 -05:00
fedac786d4 Tokenization + small fixes 2019-11-26 13:08:12 -05:00
67b422662c Documentation + improved AlbertForMaskedLM 2019-11-26 13:08:12 -05:00
1b92564330 Reorganize and cleanup 2019-11-26 13:08:12 -05:00
12290c0d5c Handles multi layer and multi groups 2019-11-26 13:08:12 -05:00
139affaa8d Albert layer/layer groups 2019-11-26 13:08:12 -05:00
91ccbae788 Accepts multiple sizes 2019-11-26 13:08:12 -05:00
c0c2088333 ALBERT model 2019-11-26 13:08:12 -05:00
8e5d84fcc1 Fixed typo 2019-11-26 09:01:32 -05:00
0669c1fcd1 SQuAD v2 BERT + XLNet 2019-11-25 19:22:21 -05:00
5d3b8daad2 Minor bug fixes on run_ner.py 2019-11-25 16:48:03 -05:00
aa92a184d2 resize model when special tokenizer present 2019-11-25 15:06:32 -05:00
07bf43074f Fix GPT2 docstring 2019-11-25 11:32:00 -05:00
fa963ecc59 if→elif 2019-11-25 10:21:03 -05:00
c8eb8157b8 fix docstrings 2019-11-25 10:21:03 -05:00
99f750d64e add Camembert models to modeling_auto 2019-11-25 10:21:03 -05:00
7485caefb0 fix #1894 2019-11-25 09:33:39 -05:00
afaa335851 [doc] Fix assets urls 2019-11-23 11:34:45 -05:00
176cd1ce1b [doc] homogenize instructions slightly 2019-11-23 11:18:54 -05:00
041a901f32 Fix typo in documentation. toto -> to 2019-11-23 10:55:16 -05:00
e0e55bc550 Manage training example & refactor the refactor 2019-11-22 16:27:45 -05:00
c3ba645237 Works for XLNet 2019-11-22 16:27:37 -05:00
a5a8a6175f Works for BERT 2019-11-22 16:27:31 -05:00
a7dafe2f41 Padding strategy (left and right) rather than boolean flag 2019-11-22 16:27:25 -05:00
9f374c8252 encode and encode_plus handle attention masks and padding 2019-11-22 16:27:15 -05:00
72e506b22e wip 2019-11-22 16:26:00 -05:00
ea52f82455 Moved some SQuAD logic to /data 2019-11-22 16:25:52 -05:00
26db31e0c0 update the documentation 2019-11-21 14:41:19 -05:00
6f70bb8c69 add instructions to run the examples 2019-11-21 14:41:19 -05:00
05d4232f63 Add valohai.yaml 2019-11-21 12:38:17 +02:00
aac3551407 Add download_glue_data.py from kamalkraj/ALBERT-TF2.0
Original source: fa90194e5f/download_glue_data.py
Original license: fa90194e5f/LICENSE (Apache-2.0)
2019-11-21 12:37:41 +02:00
2cf3447e0a Glue: log in Valohai-compatible JSON format too 2019-11-21 12:35:25 +02:00
0cdfcca24b Merge pull request #1860 from stefan-it/camembert-for-token-classification
[WIP] Add support for CamembertForTokenClassification
2019-11-21 10:56:07 +01:00
e70cdf083d Cleanup TPU bits from run_glue.py
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
2019-11-20 17:54:34 -05:00
454455c695 fix #1879 2019-11-20 09:42:48 -05:00
3de31f8d28 mean does not exist in TF2 2019-11-19 18:14:14 -05:00
da06afafc8 tree-wide: add trailing comma in configuration maps 2019-11-19 21:57:00 +01:00
2e2c0375c3 distilbert: add German distilbert model to positional embedding sizes map 2019-11-19 20:41:18 +01:00
e7cf2ccd15 distillation: add German distilbert model 2019-11-19 19:55:19 +01:00
e631383d4f docs: add new German distilbert model to pretrained models 2019-11-19 19:52:40 +01:00
f21dfe36ba distilbert: add vocab for new German distilbert model 2019-11-19 19:51:31 +01:00
22333945fb distilbert: add pytorch model for new German distilbert model 2019-11-19 19:51:01 +01:00
337802783f distilbert: add configuration for new German distilbert model 2019-11-19 19:50:32 +01:00
4193aa9f81 add TFXLNetForTokenClassification implementation and unit test
add XLNetForTokenClassification implementation and unit tests
2019-11-19 12:47:54 +01:00
f3386d9383 typo "deay" -> "decay" 2019-11-18 11:50:06 -05:00
56c84863a1 camembert: add support for CamemBERT in run_ner example 2019-11-18 17:06:57 +01:00
0b3d45eb64 camembert: add implementation for save_vocabulary method 2019-11-18 15:49:44 +01:00
3916b334a8 [camembert] Acknowledge the full author list 2019-11-18 09:29:11 -05:00
44455eb5b6 Adds CamemBERT to Model architectures list 2019-11-18 09:23:14 -05:00
33753d9139 module: import CamembertForTokenClassification 2019-11-18 14:14:54 +01:00
d32ce2c8df camembert: add wrapper for CamembertForTokenClassification 2019-11-18 14:14:19 +01:00
d08a338c3b modified: transformers/modeling_utils.py 2019-11-16 18:47:37 +09:00
0477b307c7 [camembert] tokenizer: use additional_special_tokens 2019-11-16 00:11:07 -05:00
f9abf73e31 [camembert] realign w/ recent changes 2019-11-16 00:11:07 -05:00
26858f27cb [camembert] Upload to s3 + rename script 2019-11-16 00:11:07 -05:00
035fea5315 Add CamemBERT to auto files and docs 2019-11-16 00:11:07 -05:00
694d4fcbb6 Add CamemBERT classes to __init__.py 2019-11-16 00:11:07 -05:00
3e20c2e871 Update demo_camembert.py with new classes 2019-11-16 00:11:07 -05:00
f12e4d8da7 Move demo_camembert.py to examples/contrib 2019-11-16 00:11:07 -05:00
fb6c70a91d Update tokenization_camembert.py with urls 2019-11-16 00:11:07 -05:00
e44b939e71 Add configuration_camembert.py and modeling_camembert.py 2019-11-16 00:11:07 -05:00
6e72fd094c Add demo_camembert.py 2019-11-16 00:11:07 -05:00
14b3aa3b3c Add tokenization_camembert.py 2019-11-16 00:11:07 -05:00
ca99a2d500 Update example readme 2019-11-15 14:55:26 +08:00
7da3ef24cd add is_impossible tensor to model inputs during fine-tuning xlnet on squad2.0 2019-11-15 14:18:53 +08:00
74ce8de7d8 Merge pull request #1792 from stefan-it/distilbert-for-token-classification
DistilBERT for token classification
2019-11-14 22:47:53 +01:00
05db5bc1af added small comparison between BERT, RoBERTa and DistilBERT 2019-11-14 22:40:22 +01:00
9629e2c676 Merge pull request #1804 from ronakice/master
fix multi-gpu eval in torch examples
2019-11-14 22:24:05 +01:00
5b322a36db Merge pull request #1811 from huggingface/special-tokens
Fix special tokens addition in decoder #1807
2019-11-14 22:17:24 +01:00
1a237d7f42 Merge pull request #1831 from iedmrc/gpt2-tokenization-sum-func-replacement
sum() is replaced by itertools.chain.from_iterable()
2019-11-14 22:11:54 +01:00
df99f8c5a1 Merge pull request #1832 from huggingface/memory-leak-schedulers
replace LambdaLR scheduler wrappers by function
2019-11-14 22:10:31 +01:00
0be9ae7b3e Merge pull request #1833 from huggingface/max-length-warning
Token indices sequence length is longer than the specified maximum sequence length for this model
2019-11-14 22:04:49 +01:00
be7f2aacce [CI][DOC] Don't rebuild if folder exists - Correct directory. 2019-11-14 14:54:44 -05:00
8f8d69716a [CI][DOC] Don't rebuild if folder exists. 2019-11-14 14:48:21 -05:00
2276bf69b7 update the examples, docs and template 2019-11-14 20:38:02 +01:00
d7929899da Specify checkpoint in saved file for run_lm_finetuning.py 2019-11-14 10:49:00 -05:00
a67e747889 Reorganized max_len warning 2019-11-14 10:30:22 -05:00
e18f786cd5 Quickstart example showcasing past 2019-11-14 10:06:00 -05:00
022525b003 replace LambdaLR scheduler wrappers by function
Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR
class and passing a method of the wrapping class to the __init__
function of LambdaLR. This approach is not appropriate for several
reasons:

1. one does not need to define a class when it only defines a
__init__() method;
2. instantiating the parent class by passing a method of the child class
creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134.

In this commit we replace the wrapper classes with functions that
instantiate `LambdaLR` with a custom learning rate function. We use a
closure to specify the parameter of the latter. We also do a bit of
renaming within the function to explicit the behaviour and removed
docstrings that were subsequently not necessary.
2019-11-14 15:39:08 +01:00
7627dde1f8 sum() is the leanest method to flatten a string list, so it's been replaced by itertools.chain.from_iterable() 2019-11-14 17:06:15 +03:00
74d0bcb6ff Fix special tokens addition in decoder 2019-11-12 15:27:57 -05:00
155c782a2c [inputs_embeds] All TF models + tests 2019-11-12 11:29:21 -05:00
2aef2f0bbc [common attributes] Fix previous commit for transfo-xl 2019-11-12 11:29:21 -05:00
2f17464266 [common attributes] Slightly sharper test coverage 2019-11-12 11:29:21 -05:00
9d2398fd99 Ooopsie 2019-11-12 11:29:21 -05:00
70d97ddd60 [TF models] Common attributes as per #1721 2019-11-12 11:29:21 -05:00
872403be1c This is not a @property after all 2019-11-12 11:29:21 -05:00
dd6b2e05e1 whitespace 2019-11-12 11:29:21 -05:00
d409aca326 Clarify the use of past in GPT2 and CTRL 2019-11-12 10:59:37 -05:00
7246d3c2f9 Consider do_lower_case in PreTrainedTokenizer
As pointed out in #1545, when using an uncased model, and adding
a new uncased token, the tokenizer does not correctly identify this
in the case that the input text contains the token in a cased format.

For instance, if we load bert-base-uncased into BertTokenizer, and
then use .add_tokens() to add "cool-token", we get the expected
result for .tokenize('this is a cool-token'). However, we get a
possibly unexpected result for .tokenize('this is a cOOl-Token'),
which in fact mirrors the result for the former from before the new
token was added.

This commit adds
- functionality to PreTrainedTokenizer to handle this
situation in case a tokenizer (currently Bert, DistilBert,
and XLNet) has the do_lower_case=True kwarg by:
    1) lowercasing tokens added with .add_tokens()
    2) lowercasing text at the beginning of .tokenize()
- new common test case for tokenizers

https://github.com/huggingface/transformers/issues/1545
2019-11-12 13:08:30 +02:00
2e31176557 fix multi-gpu eval 2019-11-12 05:55:11 -05:00
8aba81a0b6 fix #1789 2019-11-12 08:52:43 +01:00
94e55253ae tests: add test case for DistilBertForTokenClassification implementation 2019-11-11 16:20:15 +01:00
2b07b9e5ee examples: add DistilBert support for NER fine-tuning 2019-11-11 16:19:34 +01:00
1806eabf59 module: add DistilBertForTokenClassification import 2019-11-11 16:18:48 +01:00
1c7253cc5f modeling: add DistilBertForTokenClassification implementation 2019-11-11 16:18:16 +01:00
b5d330d118 Fix #1784 2019-11-11 10:15:14 -05:00
90f6e73a35 Add DialoGPT support for Pytorch->TF 2019-11-09 16:46:19 +00:00
ef99852961 from_pretrained: convert DialoGPT format
DialoGPT checkpoints have "lm_head.decoder.weight" instead of "lm_head.weight". 

(see: https://www.reddit.com/r/MachineLearning/comments/dt5woy/p_dialogpt_state_of_the_art_conversational_model/f6vmwuy?utm_source=share&utm_medium=web2x)
2019-11-09 16:32:40 +00:00
7a9aae1044 Fix run_bertology.py
Make imports and args.overwrite_cache match run_glue.py
2019-11-08 16:28:40 -05:00
268d4f2099 fix position biases + better tests 2019-11-08 16:41:55 +01:00
b4fcd59a5a add sentinels in tokenizer 2019-11-08 14:38:53 +01:00
15e53c4e87 maybe fix tests 2019-11-08 12:43:21 +01:00
f03c0c1423 adding models in readme and auto classes 2019-11-08 11:49:46 +01:00
4321c54125 fix tests 2019-11-08 11:49:32 +01:00
727a79b305 added TF2 model and tests - updated templates 2019-11-08 11:35:03 +01:00
cd286c2145 add condition around mask transformation 2019-11-08 11:31:16 +01:00
28d0ba35d7 only init encoder_attention_mask if stack is decoder
We currently initialize `encoder_attention_mask` when it is `None`,
whether the stack is that of an encoder or a decoder. Since this
may lead to bugs that are difficult to tracks down, I added a condition
that assesses whether the current stack is a decoder.
2019-11-08 11:22:19 +01:00
8fda532c3c fix python 2 sentencepiece tokenization 2019-11-07 17:09:50 +01:00
ba10065c4b update model, conversion script, tests and template 2019-11-07 15:55:36 +01:00
070dcf1c02 Added Mish Activation Function
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681
It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. 
All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish
Might be a good addition to experiment with especially in the Bert Model.
2019-11-07 03:45:43 +05:30
1c542df7e5 Add RoBERTa-based GPT-2 Output Detector from OpenAI
converted from https://github.com/openai/gpt-2-output-dataset/tree/master/detector

Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-Authored-By: Jong Wook Kim <jongwook@nyu.edu>
Co-Authored-By: Jeff Wu <wuthefwasthat@gmail.com>
2019-11-06 16:26:31 -05:00
2f3a421018 Fix other PyTorch models 2019-11-06 14:03:47 -05:00
d5319793c4 Fix BERT 2019-11-06 14:03:47 -05:00
27e015bd54 [tests] Flag to test on cuda 2019-11-06 14:03:47 -05:00
13d9135fa5 [tests] get rid of warning
cf. https://docs.pytest.org/en/latest/example/simple.html
2019-11-06 14:03:47 -05:00
076a207935 adding tests and updating model 2019-11-06 11:52:50 +01:00
73f2c342f5 fixing template 2019-11-06 11:52:39 +01:00
3835e1e651 adding tokenizer 2019-11-06 11:52:29 +01:00
f88c104d8f [run_tf_glue] Add comment for context 2019-11-05 19:56:43 -05:00
30968d70af misc doc 2019-11-05 19:06:12 -05:00
de890ae67d Updating docblocks in optimizers.py 2019-11-05 17:31:29 -05:00
d7d36181fd GPT-2 XL 2019-11-05 13:31:58 -05:00
151e4ab4e7 Fix CTRL past 2019-11-05 16:26:51 +00:00
88e5bef58f share position biases 2019-11-05 17:02:52 +01:00
568c0ffb7e adding T5 model 2019-11-05 16:40:29 +01:00
7daacf00df Merge pull request #1695 from huggingface/models_inputs_embeds
model forwards can take an inputs_embeds param
2019-11-05 09:55:28 -05:00
a44f112fb9 add authors for models 2019-11-05 08:48:26 -05:00
60a5babd57 adding files 2019-11-05 12:01:23 +01:00
124409d075 Make dummy inputs a property of TFPreTrainedModel. 2019-11-05 11:48:45 +01:00
e99071f105 Merge pull request #1734 from orena1/patch-1
add progress bar to convert_examples_to_features
2019-11-05 11:34:20 +01:00
dfb61caf77 fix #1692 2019-11-05 11:25:13 +01:00
ba973342e3 Merge pull request #1553 from WilliamTambellini/timeSquadInference
Add speed log to examples/run_squad.py
2019-11-05 11:13:12 +01:00
8df7dfd2a7 Make dummy inputs a local variable in TFPreTrainedModel. 2019-11-05 11:09:16 +01:00
237fad339c Merge pull request #1709 from oneraghavan/master
Fixing mode in evaluate during training
2019-11-05 10:55:33 +01:00
f1e4db2aa8 Fix #1686 2019-11-05 09:38:00 +01:00
d7906165a3 add progress bar for convert_examples_to_features
It takes considerate amount of time (~10 min) to parse the examples to features, it is good to have a progress-bar to track this
2019-11-05 10:34:27 +02:00
d2e2577dd3 Merge pull request #1723 from huggingface/fix-1623
Fix #1623
2019-11-05 08:36:30 +01:00
00337e9687 [inputs_embeds] All PyTorch models 2019-11-05 00:39:18 +00:00
9eddf44b7a docstring + check 2019-11-04 17:19:15 +00:00
8e11de0e86 model forwards can take an inputs_embeds param 2019-11-04 16:56:26 +00:00
68f7064a3e Add model.train() line to ReadMe training example
Co-Authored-By: Santosh-Gupta <San.Gupta.ML@gmail.com>
2019-11-04 11:52:35 -05:00
8d6b9d717c fix #1532 and encode_plus 2019-11-04 17:07:51 +01:00
c8f2712199 Merge pull request #1721 from huggingface/common_attributes
Add common getter and setter for input_embeddings & output_embeddings
2019-11-04 16:21:52 +01:00
89d6272898 Fix #1623 2019-11-04 16:21:12 +01:00
b340a910ed fix tests - flagged as slow all the tests downloading from AWS 2019-11-04 16:03:36 +01:00
f02805da6f fix tests 2019-11-04 15:42:23 +01:00
1d4d070256 Merge pull request #1549 from hlums/master
Fix token order in xlnet preprocessing for SQuAD
2019-11-04 15:37:15 +01:00
1724cee8c4 switch from properties to methods 2019-11-04 15:34:10 +01:00
9b45d0f878 Add common properties input_embeddings and output_embeddings 2019-11-04 12:28:56 +01:00
9a3b173cd3 Merge branch 'master' into master 2019-11-04 11:41:26 +01:00
ad90868627 Update example readme 2019-11-04 11:27:22 +01:00
e5b1048bae Fixing mode in evaluate during training 2019-11-03 16:14:46 +05:30
8a62835577 Merge pull request #1679 from cregouby/master
Fix https://github.com/huggingface/transformers/issues/1673
2019-11-01 22:02:24 +01:00
93d2fff071 Close #1654 2019-11-01 09:47:38 -04:00
1a2b40cb53 run_tf_glue MRPC evaluation only for MRPC 2019-10-31 18:00:51 -04:00
be36cf92fb Added mixed precision support to benchmarks.py 2019-10-31 17:24:37 -04:00
2a5663c280 Merge branch 'mataney-fix_top_k_top_p_filtering' 2019-10-31 18:28:34 +00:00
f96ce1c241 [run_generation] Fix generation with batch_size>1 2019-10-31 18:27:11 +00:00
3c1b6f594e Merge branch 'master' into fix_top_k_top_p_filtering 2019-10-31 13:53:51 -04:00
0e4cc050d6 Add support for resumable downloads for HTTP protocol. 2019-10-31 18:25:34 +03:00
ac29353abe Fix https://github.com/huggingface/transformers/issues/1673 2019-10-31 10:04:40 +01:00
fa735208c9 update readme - fix example command distil* 2019-10-30 14:27:28 -04:00
c7058d8224 Merge pull request #1608 from focox/master
Error raised by "tmp_eval_loss += tmp_eval_loss.item()" when using multi-gpu
2019-10-30 17:14:07 +01:00
22838f19fd Merge pull request #1668 from tlkh/fix-tf-xlm
Fixed training for TF XLM
2019-10-30 17:08:00 +01:00
7f84fc571a Merge pull request #1670 from huggingface/templates
Templates and explanation for adding a new model and example script
2019-10-30 17:05:58 +01:00
04c69db399 Merge pull request #1628 from huggingface/tfglue
run_tf_glue works with all tasks
2019-10-30 17:04:03 +01:00
5c6a19a94a Merge pull request #1604 from huggingface/deploy_doc
Versioning in documentation
2019-10-30 17:03:14 +01:00
3df4367244 Merge pull request #1601 from huggingface/clean-roberta
Clean roberta model & all tokenizers now add special tokens by default (breaking change)
2019-10-30 17:00:40 +01:00
6d73c92cae Merge pull request #1455 from huggingface/conditional-generation
[WIP] Sequence generation using pretrained BERT
2019-10-30 16:54:18 +01:00
36174696cc Merge branch 'master' into clean-roberta 2019-10-30 16:51:06 +01:00
228cdd6a6e Merge branch 'master' into conditional-generation 2019-10-30 16:40:35 +01:00
3cf2020c6b change kwargs processing 2019-10-30 16:27:51 +01:00
a88a0e4413 add tests to encoder-decoder model 2019-10-30 16:06:29 +01:00
3f07cd419c update test on Bert to include decoder mode 2019-10-30 15:09:53 +01:00
55fbfea369 Update CONTRIBUTING.md
Co-Authored-By: Stefan Schweter <stefan.schweter@bsb-muenchen.de>
2019-10-30 12:25:40 +01:00
cef2a8f900 Update CONTRIBUTING.md
Co-Authored-By: Stefan Schweter <stefan.schweter@bsb-muenchen.de>
2019-10-30 12:25:31 +01:00
328a86d2af adding links to the templates in readme and contributing 2019-10-30 11:37:55 +01:00
7f4226f9e6 adding templates 2019-10-30 11:31:56 +01:00
070507df1f format utils for summarization 2019-10-30 11:24:12 +01:00
da10de8466 fix bug with padding mask + add corresponding test 2019-10-30 11:19:58 +01:00
3b0d2fa30e rename seq2seq to encoder_decoder 2019-10-30 10:54:46 +01:00
9c1bdb5b61 revert renaming of lm_labels to ltr_lm_labels 2019-10-30 10:43:13 +01:00
842f3bf049 Fixed training for TF XLM 2019-10-30 01:32:15 +00:00
098a89f312 update docstrings; rename lm_labels to more explicit ltr_lm_labels 2019-10-29 20:08:03 +01:00
dfce409691 resolve PR comments 2019-10-29 17:10:20 +01:00
079bfb32fb Evaluation fixed. 2019-10-28 10:18:58 -04:00
438f2730a0 Evaluation code fixed. 2019-10-28 10:18:58 -04:00
4c3ac4a7d8 here's one big commit 2019-10-28 10:49:50 +01:00
932543f77e fix test of truncation function 2019-10-28 10:49:49 +01:00
a67413ccc8 extend works in-place 2019-10-28 10:49:49 +01:00
cb26b035c6 remove potential UndefinedError 2019-10-28 10:49:49 +01:00
b915ba9dfe pad sequence with 0, mask with -1 2019-10-28 10:49:49 +01:00
dc580dd4c7 add lm_labels for the LM cross-entropy 2019-10-28 10:49:49 +01:00
f873a3edb2 the decoder attends to the output of the encoder stack (last layer) 2019-10-28 10:49:00 +01:00
d36680df54 Rever changes to TF distilbert due to failed test: TFDistilBertModelTest.test_pt_tf_model_equivalence 2019-10-27 14:51:36 +08:00
ec276d6aba Add special tokens to documentation for the tensorflow model examples #1561 2019-10-27 14:00:40 +08:00
6e011690a9 Add special tokens to documentation for the rest of pytorch model examples #1561 2019-10-27 13:59:14 +08:00
beaf66b1f3 Remove break 2019-10-24 21:43:28 +00:00
bab6ad01aa run_tf_glue works with all tasks 2019-10-24 21:41:45 +00:00
ae1d03fc51 Add roberta to doc 2019-10-24 14:32:48 -04:00
4e5f88b74f Add Roberta to run_ner.py 2019-10-24 14:32:48 -04:00
b92d68421d Use roberta model and update doc strings 2019-10-24 14:32:48 -04:00
66085a1321 RoBERTa token classification
[WIP] copy paste bert token classification for roberta
2019-10-24 14:32:48 -04:00
b82bfbd0c3 Updated README to show all available documentation 2019-10-24 15:55:31 +00:00
5b6cafb11b [release] fix table weirdness 2019-10-23 10:35:16 -04:00
8ad5c591cd [RELEASE] DistilRoBERTa 2019-10-23 10:29:47 -04:00
bd847ce7d7 fixed the bug raised by "tmp_eval_loss += tmp_eval_loss.item()" when parallelly using multi-gpu. 2019-10-23 20:27:13 +08:00
6e85bccafc Fixed typo 2019-10-22 18:07:01 -04:00
fbcc5ff9fb Change branch to master 2019-10-22 18:01:10 -04:00
69eba0ab19 Edit script path 2019-10-22 17:53:52 -04:00
bc3e57d551 Multi version doc deployment 2019-10-22 17:51:30 -04:00
ef1b8b2ae5 [CTRL] warn if generation prompt does not start with a control code
see also https://github.com/salesforce/ctrl/pull/50
2019-10-22 21:30:32 +00:00
e16d46843a Fix architectures count 2019-10-22 15:13:47 -04:00
7d709e55ed Remove 2019-10-22 14:12:33 -04:00
44286b94d3 RoBERTa doesn't print a warning when no special tokens are passed. 2019-10-22 13:46:48 -04:00
1cfd974868 Option to benchmark only one of the two libraries 2019-10-22 13:32:23 -04:00
777faa8ae7 Fix #1597 2019-10-22 11:26:42 -04:00
b8c9ea0010 Merge pull request #1580 from pminervini/master
Gradient norm clipping should be done right before calling the optimiser
2019-10-22 13:59:20 +02:00
abd7110e21 gradient norm clipping should be done right before calling the optimiser - fixing run_glue and run_ner as well 2019-10-21 19:56:52 +01:00
4d456542e9 Fix citation 2019-10-21 16:34:14 +02:00
0e64fec1ab Merge pull request #1568 from daemon/patch-1
Fix hanging when loading pretrained models
2019-10-21 14:31:57 +02:00
3a52b65795 Add special tokens to documentation for bert examples to resolve issue: #1561 2019-10-21 12:55:51 +08:00
86a630702d Merge branch 'huggingface/master' 2019-10-21 12:06:09 +08:00
3775550c4b gradient norm clipping should be done right before calling the optimiser 2019-10-20 22:33:56 +01:00
bf2c36a920 Merge pull request #1 from huggingface/master
update
2019-10-20 23:30:45 +02:00
a2c8c8ef00 Fix hanging when loading pretrained models
- Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled.
2019-10-19 16:19:20 -04:00
82f6abd98a Benchmark section added to the documentation 2019-10-18 17:27:10 -04:00
7dd29ed2f1 Benchmarks example script 2019-10-18 10:53:04 -04:00
8efc0ec91a Add Benchmarks to issue templates 2019-10-18 10:45:44 -04:00
0919389d9a Add speed log to examples/run_squad.py
Add a speed estimate log (time per example)
for evaluation to examples/run_squad.py
2019-10-17 14:41:04 -07:00
fd97761c5a soft launch distilroberta 2019-10-17 15:28:58 -04:00
ecd15667f3 fix repetition penalty 2019-10-17 14:47:14 -04:00
56e2ee4ead fix model2model 2019-10-17 16:33:31 +02:00
8cd56e3036 fix data processing in script 2019-10-17 16:33:26 +02:00
578d23e061 add training pipeline (formatting temporary) 2019-10-17 14:02:27 +02:00
47a06d88a0 use two different tokenizers for storyand summary 2019-10-17 13:04:26 +02:00
bfb9b540d4 add Model2Model to __init__ 2019-10-17 12:59:51 +02:00
c1bc709c35 correct the truncation and padding of dataset 2019-10-17 10:41:53 +02:00
87d60b6e19 reword explanation of encoder_attention_mask 2019-10-17 10:18:19 +02:00
638fe7f5a4 correct composition of padding and causal masks 2019-10-17 10:13:07 +02:00
4e0f24348f document the MLM modification + raise exception on MLM training with encoder-decoder 2019-10-17 09:41:53 +02:00
624a5644cc revert black formatting to conform with lib style 2019-10-17 09:27:56 +02:00
9b71fc9a18 tying weights is going to be a clusterfuck 2019-10-16 21:31:38 +02:00
95ec1d08be separate inputs into encoder & decoder inputs 2019-10-16 20:55:42 +02:00
e4e0ee14bd add separator between data import and train 2019-10-16 20:05:32 +02:00
a424892fab correct syntax error: dim() and not dims() 2019-10-16 18:24:32 +02:00
33c01368b1 remove Bert2Rnd test 2019-10-16 18:13:05 +02:00
c544194611 Remove special_tokens_mask from inputs in README
Co-authored-by: Thomas Wolf @thomwolf
2019-10-16 11:05:13 -04:00
0752069617 adapt attention masks for the decoder case
The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.
2019-10-16 16:12:22 +02:00
c5a94a6100 fix function that defines masks in XLM
the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.
2019-10-16 13:00:32 +02:00
488a664151 add is_decoder attribute to PretrainedConfig
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.

In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
2019-10-15 21:03:32 +02:00
4c81960b9b comment the seq2seq functions 2019-10-15 20:52:28 +02:00
6d6c326737 take path to pretrained for encoder and decoder for init 2019-10-15 16:08:27 +02:00
0d81fc853e specify in readme that both datasets are required 2019-10-15 15:26:33 +02:00
19e9964780 remove Bert2Bert from module declaration 2019-10-15 15:20:28 +02:00
1aec940587 test the full story processing 2019-10-15 15:18:07 +02:00
22e1af6859 truncation function is fully tested 2019-10-15 14:43:50 +02:00
260ac7d9a8 wip commit, switching computers 2019-10-15 12:24:35 +02:00
be916cb3fb Merge branch 'master' of https://github.com/huggingface/transformers 2019-10-15 10:37:13 +02:00
5875aaf762 install tensorboard 2019-10-15 10:36:46 +02:00
40f14ff545 Merge pull request #1513 from slayton58/amp_fp16_einsum
Force einsum to run in fp16
2019-10-15 10:25:00 +02:00
e703e4dfe1 Merge pull request #1509 from julian-pani/patch-3
remove leftover usage of DUMMY_INPUTS
2019-10-15 10:24:13 +02:00
898ce064f8 add tests on TF2.0 & PT checkpoint => model convertion functions 2019-10-15 10:04:19 +02:00
d147671c6c Merge pull request #1508 from tlkh/master
Added performance enhancements (XLA, AMP) to examples
2019-10-15 09:57:18 +02:00
2c1d5564ad add readme information 2019-10-15 09:56:52 +02:00
08bd8f9f39 Merge pull request #1505 from e-budur/master
Fixed the sample code in the title 'Quick tour'.
2019-10-15 09:50:36 +02:00
8aa3b753bd Merge pull request #1434 from bryant1410/patch-1
Remove unnecessary use of FusedLayerNorm in XLNet
2019-10-15 09:44:19 +02:00
621e7a2529 Merge pull request #1275 from stecklin/ner-fine-tuning
Implement fine-tuning BERT on CoNLL-2003 named entity recognition task
2019-10-15 09:35:24 +02:00
c55badcee0 Add NER finetuning details by @stefan-it in example readme 2019-10-15 09:33:52 +02:00
788e632622 [ner] Honor args.overwrite_cache 2019-10-15 09:17:31 +02:00
0f9ebb0b43 add seqeval as requirement for examples 2019-10-15 09:17:31 +02:00
66adb71734 update to transformers 2019-10-15 09:17:31 +02:00
5ff9cd158a Add option to predict on test set 2019-10-15 09:17:31 +02:00
7f5367e0b1 Add cli argument for configuring labels 2019-10-15 09:17:31 +02:00
e1d4179b64 Make file reading more robust 2019-10-15 09:17:31 +02:00
383ef96747 Implement fine-tuning BERT on CoNLL-2003 named entity recognition task 2019-10-15 09:17:31 +02:00
5adb39e757 Add option to predict on test set 2019-10-15 09:14:53 +02:00
99b189df6d Add cli argument for configuring labels 2019-10-15 09:14:53 +02:00
3e9420add1 Make file reading more robust 2019-10-15 09:14:53 +02:00
cde42c4354 Implement fine-tuning BERT on CoNLL-2003 named entity recognition task 2019-10-15 09:14:53 +02:00
74c5035808 Fix token order in xlnet preprocessing. 2019-10-14 21:27:11 +00:00
fe25eefc15 add instructions to fetch the dataset 2019-10-14 20:45:39 +02:00
412793275d delegate the padding with special tokens to the tokenizer 2019-10-14 20:45:16 +02:00
447fffb21f process the raw CNN/Daily Mail dataset
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with  all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
2019-10-14 18:12:20 +02:00
80889a0226 Merge pull request #1512 from louismartin/fix-roberta-convert
Fix import error in script to convert faisreq roberta checkpoints
2019-10-14 17:40:32 +02:00
4e6a55751a Force einsum to fp16 2019-10-14 11:12:41 -04:00
f62f992cf7 Merge pull request #1502 from jeffxtang/master
the working example code to use BertForQuestionAnswering
2019-10-14 16:14:52 +02:00
67d10960ae load and prepare CNN/Daily Mail data
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
2019-10-14 14:11:20 +02:00
d9d387afce clean up 2019-10-14 12:14:40 +02:00
b7141a1bc6 maxi simplication 2019-10-14 12:14:08 +02:00
bfbe68f035 update forward pass 2019-10-14 12:04:23 +02:00
0ef9bc923a Cleaning up seq2seq [WIP] 2019-10-14 11:58:13 +02:00
49cba6e543 Fix import error in script to convert faisreq roberta checkpoints 2019-10-14 01:38:57 -07:00
0993586758 remove usage of DUMMY_INPUTS
Hey @thomwolf  
This change da26bae61b (diff-8ddce309e88e8eb5b4d02228fd8881daL28-L29) removed the constant, but one usage of that constant remains in the code.
2019-10-14 02:09:53 +03:00
376e65a674 Added automatic mixed precision and XLA options to run_tf_glue.py 2019-10-13 13:19:06 +00:00
86f23a1944 Minor enhancements to run_tf_glue.py 2019-10-13 10:21:35 +00:00
5a8c6e771a Fixed the sample code in the title 'Quick tour'. 2019-10-12 14:17:17 +03:00
e76d71521c the working example code to use BertForQuestionAnswering and get an answer from a text and a question 2019-10-11 17:04:02 -07:00
d844db4005 Add citation bibtex 2019-10-11 16:55:42 -04:00
a701c9b321 CTRL to tf automodels 2019-10-11 16:05:30 -04:00
b3261e7ace read parameters from CLI, load model & tokenizer 2019-10-11 18:40:38 +02:00
d889e0b71b add base for seq2seq finetuning 2019-10-11 17:36:12 +02:00
f8e98d6779 load pretrained embeddings in Bert decoder
In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.
2019-10-11 16:48:11 +02:00
3ddce1d74c Release: 2.1.1 2019-10-11 06:37:49 -04:00
4428aefc63 Merge pull request #1488 from huggingface/pytorch-tpu
GLUE on TPU
2019-10-11 16:33:00 +02:00
3b43b01872 Merge pull request #1482 from huggingface/tf2_integration_tests
Integration of TF 2.0 models with other Keras modules
2019-10-11 16:25:43 +02:00
4b8f3e8f32 adding citation 2019-10-11 16:18:16 +02:00
18a3cef7d5 no nans 2019-10-11 16:09:42 +02:00
1f5d9513d8 fix test 2019-10-11 15:55:01 +02:00
0f9fc4fbde adding option to desactivate past/memory outputs 2019-10-11 15:47:08 +02:00
700331b5ec Merge pull request #1492 from stefan-it/bert-german-dbmdz-models
Add new BERT models for German (cased and uncased)
2019-10-11 13:01:52 +02:00
573dde9b44 Merge pull request #1405 from slayton58/xlnet_layer_reorder
Re-order XLNet attention head outputs for better perf
2019-10-11 12:10:58 +02:00
5f25a5f367 model: add support for new German BERT models (cased and uncased) from @dbmdz 2019-10-11 10:20:33 +02:00
f382a8decd convert int to str before adding to a str 2019-10-10 19:20:39 -04:00
639f4b7190 Don't save/load when on TPU 2019-10-10 19:17:25 +00:00
d4e7934ac3 GLUE on TPU 2019-10-10 19:03:06 +00:00
1e68c28670 add test for initialization of Bert2Rnd 2019-10-10 18:07:11 +02:00
2a4fef837a move Circle-CI from TF2-rc0 to official TF2 2019-10-10 15:57:35 +02:00
751e246087 using tf.print in roberta 2019-10-10 15:47:20 +02:00
fa218e648a fix syntax errors 2019-10-10 15:16:07 +02:00
c9e8c51946 fixing SequenceSummary head in TF 2.0 2019-10-10 15:16:05 +02:00
da26bae61b adding more tests on TF and pytorch serialization - updating configuration for better serialization 2019-10-10 14:30:48 +02:00
3e1cd8241e fix stupid (re)naming issue 2019-10-10 14:18:20 +02:00
81ee29ee8d remove the staticmethod used to load the config 2019-10-10 14:13:37 +02:00
bb04edb45b Add tests that TF 2.0 model can be integrated with other Keras modules 2019-10-10 13:08:24 +02:00
d7092d592c rename the attributes in the Bert Layer
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
2019-10-10 12:51:14 +02:00
51261167b4 prune both attention and self-attention heads 2019-10-10 12:17:22 +02:00
17177e7379 add is_decoder as an attribute to Config class 2019-10-10 12:03:58 +02:00
6596e3d566 Merge pull request #1454 from bkkaggle/pytorch-built-in-tensorboard
Change tensorboard imports to use built-in tensorboard if available
2019-10-10 11:56:55 +02:00
4bc4601192 Merge pull request #1480 from huggingface/fix_ctrl_tokenizer
Fixing CTRL tokenizer - Update error messages - XLM-MLM in run_generation
2019-10-10 11:56:20 +02:00
177a721205 move back to simple space spliting 2019-10-10 11:45:47 +02:00
df85a0ff0b replace double quotes with simple quotes 2019-10-10 11:38:26 +02:00
9ca788b2e8 merge the two Bert layers classes 2019-10-10 11:33:28 +02:00
a5997dd81a better error messages 2019-10-10 11:31:01 +02:00
edfc8f8225 Remove and do the branching in 2019-10-10 10:17:27 +02:00
09cfd12235 remove and do the branching in 2019-10-10 10:15:27 +02:00
43a237f15e switching to moses tokenizer 2019-10-10 10:11:16 +02:00
877ef2c6ca override from_pretrained in Bert2Rnd
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
2019-10-10 10:02:18 +02:00
851ef592c5 add comment on recursive weights loading 2019-10-10 10:02:03 +02:00
036483fae5 Temporary CTRL tokenizer fix 2019-10-09 16:33:15 -04:00
9c2e0a4acf Release: 2.1.0 2019-10-09 12:14:03 -04:00
7fe98d8c18 Update CTRL documentation 2019-10-09 12:12:36 -04:00
89f86f9661 CTRL added to the documentation 2019-10-09 12:04:06 -04:00
e17ea08e24 Pycharm folder added to gitignore 2019-10-09 11:32:21 -04:00
2431fea98a Merge pull request #1383 from keskarnitish/master
Adding CTRL
2019-10-09 11:31:05 -04:00
d9e60f4f0d Merge branch 'master' into pr/1383 2019-10-09 17:25:08 +02:00
e84470ef81 Merge pull request #1384 from huggingface/encoding-qol
Quality of life enhancements in encoding + patch MLM masking
2019-10-09 11:18:24 -04:00
07d055f849 higher tolerance 2019-10-09 17:10:04 +02:00
48b438ff2a doc and conversion 2019-10-09 17:06:30 +02:00
69629c4f0f Improve naming and only do regex when necessary 2019-10-09 08:48:40 -04:00
bf34a252b8 Golden path 2019-10-09 08:48:40 -04:00
528d3f327b Improve readability and improve make less assumptions about checkpoint format 2019-10-09 08:48:40 -04:00
56301bd9e8 Extract method 2019-10-09 08:48:40 -04:00
d6c5469712 Delete older checkpoint after saving new checkpoint 2019-10-09 08:48:40 -04:00
54a31f50fb Add save_total_limit 2019-10-09 08:48:40 -04:00
c19b8e4ae0 fixing CTRL tests and OpenAI GPT tests 2019-10-09 13:51:05 +02:00
6dce6dda1b fixing TF 2.0 model - adding more severe test on pt/tf equivalence 2019-10-09 11:57:55 +02:00
c56d921dda adding TF 2.0 model 2019-10-09 11:07:43 +02:00
1c5079952f simpler distilbert mask - fix tf tests 2019-10-09 04:26:20 +02:00
58b302caf3 Merge pull request #1398 from dveselov/patch-1
Fixed typo in docs README
2019-10-09 03:52:42 +02:00
439fac723a Merge pull request #1409 from brian41005/master
Evaluation result.txt path changing #1286
2019-10-09 03:14:34 +02:00
23b7138ab4 fix #1378 and #1453 2019-10-09 01:54:44 +02:00
5ce8d29abe Change tensorboard imports to use built-in tensorboard if available 2019-10-08 16:29:43 -05:00
d688af19e5 Update link to swift-coreml-transformers
cc @lysandrejik
2019-10-08 16:37:52 -04:00
45dc04f33d tf model [WIP] 2019-10-08 17:37:17 +02:00
770b15b58c rename class in __init__ 2019-10-08 17:32:28 +02:00
248314772f fix tokenization 2019-10-08 17:19:28 +02:00
03c2c762a6 update tokenizer 2019-10-08 17:12:03 +02:00
3edfa1d6aa update model to use past 2019-10-08 17:11:58 +02:00
f4d41fe33e Merge pull request #1448 from huggingface/contributing
add contribution guidelines
2019-10-08 16:55:34 +02:00
61ed889005 remove old seq2seq file 2019-10-08 16:30:58 +02:00
8abfee9ec3 rename Bert2Bert -> Bert2Rnd 2019-10-08 16:30:58 +02:00
82628b0fc9 add a placeholder test 2019-10-08 16:30:58 +02:00
0700983090 Add BertDecoderModel and Bert2Bert classes
I am not sure what happens when the class is initialized with the
pretrained weights.
2019-10-08 16:30:58 +02:00
75feacf172 add general structure for Bert2Bert class 2019-10-08 16:30:58 +02:00
15a2fc88a6 add General attention classes
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.

There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
2019-10-08 16:30:58 +02:00
cd6a59d5c1 add a decoder layer for Bert 2019-10-08 16:30:58 +02:00
45de313a9e add bullet point on modifying an existing PR 2019-10-08 11:54:10 +02:00
ade05b6cef add code contribution 2019-10-07 23:20:25 +02:00
e9c09052a4 add issues and requests guidelines 2019-10-07 22:30:55 +02:00
8fcc6507ce Multilingual 2019-10-07 15:02:42 -04:00
6e3e1c959e Merge pull request #1447 from huggingface/dev-requirements
Provide requirements.txt for development dependencies
2019-10-07 18:49:26 +02:00
7ce83b4931 update weights for distilgpt2 2019-10-07 12:30:27 -04:00
9f81f1cba8 fix convert pt_to_tf2 for custom weights 2019-10-07 12:30:19 -04:00
7afd00a661 freeze dev requirements 2019-10-07 17:58:13 +02:00
a0dcefa382 generalize BertSelfAttention to take separate query, key, value
There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.

This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.
2019-10-07 17:53:58 +02:00
31adbb247c add class wireframes for Bert decoder 2019-10-07 16:43:21 +02:00
dda1adad6d rename BertLayer to BertEncoderLayer 2019-10-07 16:31:46 +02:00
0053c0e052 do some (light) housekeeping
Several packages were imported but never used, indentation and line
spaces did not follow PEP8.
2019-10-07 16:29:15 +02:00
bd5363cc83 update CTRL configuration 2019-10-07 15:37:30 +02:00
dc89441167 update CTRL pytorch model 2019-10-07 15:37:25 +02:00
320b7a7e01 fix #1416 2019-10-07 14:26:59 +02:00
386e86e222 raise exception when class initialized with __init__ 2019-10-07 13:00:06 +02:00
4446c02b8a add wireframe for seq2seq model 2019-10-07 12:04:05 +02:00
1615360c71 Merge pull request #1438 from SeanBE/master
fix pytorch-transformers migration description in README
2019-10-07 05:02:23 -04:00
6dc6c716c5 fix pytorch-transformers migration description in README 2019-10-07 09:59:54 +01:00
904158ac4d Rephrase forward method to reduce ambiguity 2019-10-06 23:40:52 -04:00
0f65d8cbbe Fix some typos in README 2019-10-06 23:40:52 -04:00
1dea291a02 Remove unnecessary use of FusedLayerNorm in XLNet 2019-10-06 13:35:01 -04:00
f3e0218fbb Correct device assignment in run_generation 2019-10-05 21:05:16 -04:00
78ef1a9930 fixes 2019-10-04 17:59:44 -04:00
6c1d0bc066 update encode_plus - add truncation strategies 2019-10-04 17:38:38 -04:00
0820bb0555 unecessary carriage return 2019-10-04 17:23:15 -04:00
f5891c3821 run_squad --> run_squad_w_distillation 2019-10-04 17:23:15 -04:00
764a7923ec add distillation+finetuning option in run_squad 2019-10-04 17:23:15 -04:00
bb464289ce New model addition issue template 2019-10-04 16:41:26 -04:00
92c0f2fb90 Merge remote-tracking branch 'origin/julien_multiple-choice' into encoding-qol 2019-10-04 15:48:06 -04:00
9e136ff57c Honor args.overwrite_cache (h/t @erenup) 2019-10-04 15:00:56 -04:00
7bddb45a6f Decode documentaton 2019-10-04 14:27:38 -04:00
dbed1c5d94 Adding CTRL (squashed commit)
adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well
2019-10-03 22:29:03 -07:00
b3cfd97946 Merge pull request #1373 from TimYagan/fix-css
Fixed critical css font-family issues
2019-10-03 19:04:02 -04:00
81a1e12469 Merge pull request #1313 from enzoampil/master
Add option to use a 'stop token'
2019-10-03 22:43:57 +00:00
d3f24dfad7 Merge branch 'master' into master 2019-10-03 22:43:09 +00:00
ecc4f1bdfa XLM use_lang_embedding flag in run_generation 2019-10-03 17:42:16 -04:00
c2c2ca0fdb Added XLM to run_generation, with prompt language selection. 2019-10-03 17:18:48 -04:00
1569610f2d Merge pull request #1296 from danai-antoniou/add-duplicate-tokens-error
Added ValueError for duplicates in list of added tokens
2019-10-03 17:06:17 -04:00
e1b2949ae6 DistillBert Documentation Code Example fixes 2019-10-03 15:51:33 -04:00
899883644f Fix test fails and warnings
Attention output was in bnij ordering instead of ijbn which everything
else will expect. This was an oversight on my part, and keeps the
attention inputs/outputs identical to the original code.

Also moved back from tensor slicing to index_select in rel_shift_bnij to
make the tracer happy.
2019-10-03 12:05:15 -04:00
e2ae9c0b73 fix links in doc index 2019-10-03 11:42:21 -04:00
aebd83230f Update naming + remove f string in run_lm_finetuning example 2019-10-03 11:31:36 -04:00
651bfb7ad5 always_truncate by default 2019-10-03 11:31:36 -04:00
5ed50a93fb LM finetuning won't mask special tokens anymore 2019-10-03 11:31:36 -04:00
cc412edd42 Supports already existing special tokens 2019-10-03 11:31:36 -04:00
2f259b228e Sequence IDS 2019-10-03 11:31:36 -04:00
7c789c337d Always truncate argument in the encode method 2019-10-03 11:31:36 -04:00
7af0777910 Update run_glue.py
add DistilBert model shortcut into ALL_MODELS
2019-10-03 15:31:11 +00:00
c1689ac301 fix name 2019-10-03 10:56:39 -04:00
4a790c40b1 update doc for distil* 2019-10-03 10:54:02 -04:00
6be46a6e64 update links to new weights 2019-10-03 10:27:11 -04:00
5f07d8f11a prepare release 2019-10-03 10:27:11 -04:00
35071007cb incoming release 🔥 update links to arxiv preprint 2019-10-03 10:27:11 -04:00
f1f23ad171 fix buf in convert_pt_chkpt_to_tf2 2019-10-03 10:27:11 -04:00
2a91f6071f upddate README - TODO updadte link to paper 2019-10-03 10:27:11 -04:00
c51e533a5f update train.py 2019-10-03 10:27:11 -04:00
a76c3f9cb0 update requirements 2019-10-03 10:27:11 -04:00
bb9c5ead54 update distiller 2019-10-03 10:27:11 -04:00
a12ab0a8db update binarized_data 2019-10-03 10:27:11 -04:00
4d6dfbd376 update extract 2019-10-03 10:27:11 -04:00
23edebc079 update extract_distilbert 2019-10-03 10:27:11 -04:00
cbfcfce205 update token_counts 2019-10-03 10:27:11 -04:00
19e4ebbe3f grouped_batch_sampler 2019-10-03 10:27:11 -04:00
594202a934 lm_seqs_dataset 2019-10-03 10:27:11 -04:00
38084507c4 add distillation_configs 2019-10-03 10:27:11 -04:00
9ffda216ec Fix missed head transpose 2019-10-03 09:23:16 -04:00
b5d73976ad Revert "fixing for roberta tokenizer decoding"
This reverts commit 22e7c4edaf007d92912df54c336d10078ac7d565.
2019-10-03 20:48:17 +08:00
22e7c4edaf fixing for roberta tokenizer decoding 2019-10-03 18:33:53 +08:00
2195c0d5f9 Evaluation result.txt path changing #1286 2019-10-03 12:49:12 +08:00
ebb32261b1 fix #1401 2019-10-02 17:52:56 -04:00
d51b589404 Re-order attention head outputs for better perf
Significant performance boost over the original orderings
on an already somewhat optimised branch this gave me > 2x end-to-end
throughput on a squad xlnet fine-tuning task (batch 8, seq-length 612,
fp16)
2019-10-02 12:18:21 -04:00
63ed224b7c initialy -> initially 2019-10-02 15:04:18 +00:00
a95158518d Moved duplicate token check 2019-10-02 07:44:15 +01:00
d73957899a Merge branch 'master' of https://github.com/danai-antoniou/pytorch-transformers into add-duplicate-tokens-error 2019-10-02 07:38:50 +01:00
cd69bc9c87 Fixed typo in docs README 2019-10-02 03:21:55 +03:00
391db836ab fix #1260 - remove special logic for decoding pairs of sequence 2019-10-01 19:09:13 -04:00
963529e29b Merge pull request #1288 from echan00/master
Typo with LM Fine tuning script
2019-10-01 18:46:07 -04:00
f7978f70ec use format instead of f-strings 2019-10-01 18:45:38 -04:00
1e4a191366 Merge pull request #1284 from slayton58/pooler_end_logits_fp16_fix
Fix fp16 masking in PoolerEndLogits
2019-10-01 18:40:22 -04:00
c50783e388 Merge branch 'pooler_end_logits_fp16_fix' of https://github.com/slayton58/pytorch-transformers into pr/1284 2019-10-01 18:17:48 -04:00
6971556ab8 Fix syntax typo in README.md 2019-10-01 14:59:31 -04:00
b350662955 overflowing_tokens do not really make sense here, let's just return a number
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-30 16:37:09 -04:00
f5bcde0b2f [multiple-choice] Simplify and use tokenizer.encode_plus 2019-09-30 16:04:55 -04:00
5c3b32d44d Update README.md
Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
2019-09-30 18:48:01 +00:00
2dc8cb8734 fix unknown imports (*ForMultipleChoice) in run_multiple_choice 2019-09-29 19:51:01 -04:00
0a4ed7192e Fixed critical css font-family issues
Fixed critical css font-family issues to ensure compatibility with multiple webbrowsers
2019-09-29 13:51:01 +02:00
ae50ad91ea Merge pull request #1362 from FeiWang96/doc
fix link
2019-09-28 10:26:42 +02:00
60f791631b Fix link in readme 2019-09-28 16:20:17 +08:00
a6a6d9e638 fix padding_idx of RoBERTa model 2019-09-27 19:03:55 -04:00
d8b641c839 6 -> 8 models 2019-09-27 17:22:01 -04:00
c6acbdd50a Close #1304 2019-09-27 17:02:53 -04:00
df7cd9e4e4 Merge pull request #1353 from wendingp/patch-1
Fix some typos
2019-09-27 23:00:34 +02:00
6a17b3c51b Merge pull request #1355 from agrinh/master
Fix tensorflow_dataset glue support
2019-09-27 22:59:54 +02:00
04e9a6f512 Merge pull request #1359 from dennymarcels/patch-1
Update run_lm_finetuning.py
2019-09-27 22:58:19 +02:00
9478590630 Update run_lm_finetuning.py
The previous method, just as phrased, did not exist in the class.
2019-09-27 15:18:42 -03:00
795b3e76ff Add docstring for processor method 2019-09-27 17:32:28 +02:00
e31a472801 Fix tensorflow_dataset glue support
`glue_convert_examples_to_features` assumed that tensorflow_dataset
examples contains the features `'sentence1'` and `'sentence2'`. This
commit encapsulates the choice of features in the glue processor and
uses that to parse examples.
2019-09-27 17:16:02 +02:00
pj
4f2b6579bf Fix some typos 2019-09-27 22:55:43 +08:00
ca559826c4 Merge pull request #1349 from ogabrielluiz/master
Just some typos
2019-09-27 13:08:00 +02:00
d2de5b9d8c Just some typos 2019-09-27 07:08:36 -03:00
d83d295763 Merge pull request #1337 from mgrankin/fastdataset
faster dataset building
2019-09-27 10:35:12 +02:00
f6de000305 Merge pull request #1346 from BramVanroy/documentation
Add small  note about the output of hidden states (closes #1332)
2019-09-27 10:30:07 +02:00
15749bfc10 Add small note about the output of hidden states 2019-09-27 10:01:36 +02:00
da2e47ad15 clean up a little run_tf_glue 2019-09-27 09:41:15 +02:00
528c288fa9 clean up run_tf_glue 2019-09-27 09:40:29 +02:00
702f589848 fix input in run_glue for distilbert 2019-09-27 00:20:14 -04:00
22d2fded2c [docs] Fix doc auto-deploy
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-26 18:22:45 -04:00
fc9faa8a47 [docs] Doc tweaks
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-26 18:19:51 -04:00
ecfddc6034 Update RoBERTa and GPT-2 Tokenizer documentation (fix #1343) 2019-09-26 16:49:03 -04:00
93f0c5fc72 Repository link in the documentation 2019-09-26 11:45:00 -04:00
6c3b131516 typo in readme/doc 2019-09-26 16:23:28 +02:00
f83b35b77d Merge branch 'master' of https://github.com/huggingface/pytorch-transformers 2019-09-26 16:14:23 +02:00
4e63c90720 update installation instructions in readme 2019-09-26 16:14:21 +02:00
7e957237e4 [Doc] XLM + Torch in documentation 2019-09-26 10:08:56 -04:00
302a4813a5 Doc building requirements [TF2] 2019-09-26 09:57:30 -04:00
f71a4577b8 faster dataset building 2019-09-26 16:53:13 +03:00
a3e0dbba95 Doc building requirements [TF] 2019-09-26 09:51:14 -04:00
0f92f76ca3 CircleCI reference in README 2019-09-26 08:59:52 -04:00
4094958df2 Doc building requirements 2019-09-26 08:50:55 -04:00
7d8b395afa Doc building requirements 2019-09-26 08:49:31 -04:00
927904bc91 [doc] pytorch_transformers -> transformers 2019-09-26 08:47:15 -04:00
294edfd83d Release version in documentation 2019-09-26 08:16:12 -04:00
de5e4864cb Documentation 2019-09-26 08:04:54 -04:00
e4e35296fb update setup.py metadata 2019-09-26 13:52:24 +02:00
1d646badbb Merge branch 'master' of https://github.com/huggingface/pytorch-transformers 2019-09-26 13:48:00 +02:00
9676d1a2a8 update readme and setup.py 2019-09-26 13:47:58 +02:00
8349d75773 Various small doc fixes 2019-09-26 07:45:40 -04:00
fb056494e5 Example usage 2019-09-26 07:45:40 -04:00
36f592cc82 Updated doc for InputExample and InputFeatures 2019-09-26 07:45:40 -04:00
ad4a393e2e Changed processor documentation architecture. Added documentation for GLUE 2019-09-26 07:45:40 -04:00
c4ac7a76db GLUE processors 2019-09-26 07:45:40 -04:00
4acd87ff4e TF models added to documentation 2019-09-26 07:45:40 -04:00
cf5c5c9e1c Documentation 2019-09-26 07:43:13 -04:00
4dde31cb76 update readme 2019-09-26 12:18:26 +02:00
17ea43cf98 Merge pull request #1203 from huggingface/tf2
[2.0] TF 2.0 support
2019-09-26 12:11:03 +02:00
80bf868a26 Merge branch 'master' into tf2 2019-09-26 12:04:47 +02:00
481d9c4fb5 Merge branch 'master' into tf2 2019-09-26 12:02:54 +02:00
4ddc31ff40 update readme with migration change 2019-09-26 12:00:38 +02:00
f47f7f4611 add logo 2019-09-26 11:28:44 +02:00
9fabc0b6a9 wip readme 2019-09-26 11:21:34 +02:00
31c23bd5ee [BIG] pytorch-transformers => transformers 2019-09-26 10:15:53 +02:00
2f071fcb02 clean up TFConv1D API 2019-09-26 10:09:45 +02:00
5705333441 add initialization for everybody 2019-09-26 10:06:20 +02:00
f2a337b3ed fix tokenization tests for gpt2 roberta 2019-09-26 09:02:43 +02:00
4a233e5b2c Merge pull request #1315 from bryant1410/patch-1
Remove unnecessary use of FusedLayerNorm
2019-09-26 08:50:02 +02:00
7a99e4b196 fix #1196 and fix #1285 2019-09-26 08:41:02 +02:00
7c9f8f93f9 fix tests 2019-09-26 01:59:53 +02:00
d6dde438ea add batch dimension in encode 2019-09-26 01:45:55 +02:00
4a21c4d88d add warning if neither pt nor tf are found 2019-09-26 01:30:06 +02:00
2967de06f4 adding intialization to bert 2019-09-25 22:08:38 +02:00
a6bcfb8015 fix tests 2019-09-25 21:14:12 +02:00
78863f6b36 fix tokenizer to tensors 2019-09-25 21:09:46 +02:00
8a618e0af5 clean up __init__ 2019-09-25 21:04:52 +02:00
3b7fb48c3b fix loading from tf/pt 2019-09-25 17:46:16 +02:00
a049c8043b push fix to training 2019-09-25 17:33:16 +02:00
a9f24a16bc [FIX] fix run_generation.py to work with batch_size > 1 2019-09-25 15:53:29 +03:00
5def3302f4 update run_glue 2019-09-25 12:38:08 +02:00
f71758f7a4 update internal glue processors 2019-09-25 12:00:50 +02:00
0f091062d4 Merge branch 'glue-example' into tf2 2019-09-25 10:21:52 +02:00
c4acc3a8e9 let encode accept tensor inputs 2019-09-25 10:19:14 +02:00
e8e956dbb2 Merge pull request #1327 from huggingface/tf2-determinism
Pytorch/TF2 determinism
2019-09-24 22:49:57 +02:00
e4022d96f7 Merge pull request #1325 from huggingface/glue-included
[Proposal] GLUE processors included in library
2019-09-24 21:40:10 +02:00
1761d2091a Check to see if the models have the same results when in eval mode (pt) or when training=False (tf) 2019-09-24 14:59:10 -04:00
789ea72037 fix output_token_type in glue 2019-09-24 17:32:01 +02:00
1cbd566c63 Merge branch 'glue-example' into glue-included 2019-09-24 17:24:52 +02:00
743e383d4b py2 fix 2019-09-24 17:21:54 +02:00
99a90e43d4 update data processors __init__ 2019-09-24 17:16:46 +02:00
b5ec526f85 updated data processor and metrics 2019-09-24 17:10:50 +02:00
a6981076ec various updates 2019-09-24 16:46:26 +02:00
0b82e3d0d9 Relative imports 2019-09-24 09:52:25 -04:00
f09e5ecef0 [Proposal] GLUE processors included in library 2019-09-24 09:47:34 -04:00
128bdd4c35 fix tests pt/tf 2019-09-24 15:43:39 +02:00
72402d1acd Fixed DistilBERT tokenizer 2019-09-24 09:41:14 -04:00
28a30af6d1 fix auto models 2019-09-24 15:33:39 +02:00
de203853cc docstring for xlnet 2019-09-24 15:30:55 +02:00
559790f9e4 docstring for xlm 2019-09-24 15:26:57 +02:00
b3087ddde8 docstring t-xl 2019-09-24 15:21:51 +02:00
4761a39781 doctring roberta 2019-09-24 15:19:09 +02:00
45a6f2edd9 docstring for GPT 2019-09-24 15:15:47 +02:00
e7ba5bc85b docstring for GPT2 2019-09-24 15:12:36 +02:00
d340e2329e create_mask_from_sequences -> create_token_type_ids_from_sequences 2019-09-24 09:09:28 -04:00
b94f73bab7 distilbert docstring 2019-09-24 15:06:51 +02:00
9678c49419 docstrings for bert 2019-09-24 14:57:05 +02:00
f3d1511b5b fix imports 2019-09-24 14:42:09 +02:00
dd2d90f344 update automodels 2019-09-24 14:39:41 +02:00
ee261439a9 add save_pretrained 2019-09-24 14:30:28 +02:00
29bb3e4eb0 double loading ok 2019-09-24 14:23:46 +02:00
f5397ffc3b update loading logics 2019-09-24 14:03:58 +02:00
271f213621 updating to load tf model in pt - fixing headmasking test 2019-09-24 13:51:28 +02:00
cf9c1cbb60 fix tests chen only using tf 2019-09-24 13:32:47 +02:00
2167e366ba update circleCi 2019-09-24 13:27:45 +02:00
e9a103c17a bidirectional conversion TF <=> PT - extended tests 2019-09-24 13:25:50 +02:00
c832f43a4d output_token_type -> token_type_ids 2019-09-24 07:21:38 -04:00
3927d7756c Updated the GLUE pre-processing method 2019-09-24 07:15:11 -04:00
0ea82b246f Updated tests 2019-09-24 07:10:09 -04:00
9d44236f70 Updated DistilBERT 2019-09-24 07:03:24 -04:00
a7e01a248b converting distilled/fine-tuned models 2019-09-24 10:58:52 +02:00
8ba44ced95 fix roberta conversion script 2019-09-24 09:48:23 +02:00
2b11fa5174 update __init__ and conversion script 2019-09-23 22:35:45 +02:00
6448396d54 fix roberta test 2019-09-23 22:27:13 +02:00
1e47dee24c Merge branch 'tf2' of https://github.com/huggingface/pytorch-transformers into tf2 2019-09-23 22:08:10 +02:00
c9591f6fac updated models input format + tests 2019-09-23 22:08:08 +02:00
798da627eb Fix TFBert tests in Python 3.5 2019-09-23 12:06:10 -04:00
c014d1f0c6 fix the skipping 2019-09-23 16:39:57 +02:00
0b22e47a40 skipping pretrained TF model tests for now 2019-09-23 16:38:03 +02:00
830d212be7 test circleCI h5py version 2019-09-23 16:26:06 +02:00
7c0f2d0a6a Merge pull request #1294 from sshleifer/delete-n-special-doc
Delete n_special reference in docstring
2019-09-23 14:54:55 +01:00
a31e591d27 fix XLM tests 2019-09-23 15:54:10 +02:00
447de34dde tests for distilbert and roberta 2019-09-23 15:38:29 +02:00
98dd19b96b Remove unnecessary use of FusedLayerNorm 2019-09-22 20:31:36 -04:00
4b543c3007 Add option to use a 'stop token' which will be used to truncate the output text to everything till right before the 'stop token' 2019-09-22 21:38:38 +08:00
68a3e0223a roberta and distilbert 2019-09-20 23:14:51 +02:00
a2d4950f5c fix annotation 2019-09-20 10:59:35 -04:00
9f995b99d4 minor fixes 2019-09-19 21:36:06 +00:00
3fe5c8e8a8 update bert-base-uncased rslts 2019-09-19 19:34:22 +00:00
354944e607 [distillation] big update w/ new weights 2019-09-19 19:25:21 +00:00
2e6797cc7d Added valuerror for duplicate added tokens 2019-09-19 15:40:42 +01:00
ab984a8b72 Python 2 compatibility 2019-09-19 15:01:33 +02:00
3df208c93a Tokenizer accepts token list as well as string 2019-09-19 14:47:52 +02:00
66ea76b8a9 prepare_for_model and prepare_pair_for_model methods. Added an option to select which sequence will be truncated. 2019-09-19 13:50:51 +02:00
60414f31a9 GLUE updated with new methods 2019-09-19 10:55:06 +02:00
baa74326ab Stride + tests + small fixes 2019-09-19 10:55:06 +02:00
c10c7d59e7 Mask computing in standalone method. Tests. 2019-09-19 10:55:06 +02:00
bf503158c5 Sentence -> Sequence. Removed output_mask from the special token addition methods. 2019-09-19 10:55:06 +02:00
8cba057260 Doc + remove artefacts 2019-09-19 10:55:06 +02:00
6393261e41 encode + encode_plus tests modified 2019-09-19 10:55:06 +02:00
dcc9bb3252 Modified encode to return only lists. Added a more complete encode_plus method 2019-09-19 10:55:06 +02:00
af23b626c8 Max encoding length + corresponding tests 2019-09-19 10:55:06 +02:00
c4d4f3ec8c Updated DistilBERT test to reflect the sequence encoding 2019-09-19 10:55:06 +02:00
d572d7027b Number of added tokens calculator 2019-09-19 10:55:06 +02:00
de8e14b6c0 Added DistilBERT to run_squad script 2019-09-19 10:55:06 +02:00
88368c2a16 Added DistilBERT to run_lm_finetuning 2019-09-19 10:55:06 +02:00
2d8ec5a684 Changed warning to be more explicit
Co-authored by: julien_c <chaumond@gmail.com>
2019-09-19 10:55:06 +02:00
75635072e1 Updated GLUE script to add DistilBERT. Cleaned up unused args in the utils file. 2019-09-19 10:55:06 +02:00
92a9976e91 Distilbert sequence builder w/ mask 2019-09-19 10:55:06 +02:00
59057abe52 typo 2019-09-19 10:55:06 +02:00
bac332fec0 Updated the GLUE data processor. Corrections to RoBERTa and XLNet. 2019-09-19 10:55:06 +02:00
c3df2136e1 Added binary masking tests 2019-09-19 10:55:06 +02:00
e391d4735e Tokenizers' encode function can output binary masks 2019-09-19 10:55:06 +02:00
119610b5c5 Merge branch 'master' into delete-n-special-doc 2019-09-19 01:35:01 -07:00
08e4ad5eea Remove documentation for unused kwarg 2019-09-18 16:35:01 -07:00
f0340eccf9 Typo
Typo
2019-09-18 13:42:11 -07:00
0d1dad6d53 Merge pull request #1004 from erenup/master
Refactoring old run_swag.py
2019-09-18 21:42:51 +02:00
8960988f35 fixed to find best dev acc 2019-09-19 01:10:05 +08:00
b57bfb5fa0 Merge pull request #3 from erenup/run_multiple_choice_merge
Run multiple choice merge
2019-09-18 21:45:04 +08:00
46ffc28329 Merge branch 'master' into run_multiple_choice_merge
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
2019-09-18 21:43:46 +08:00
ec94f4e0f8 Fix fp16 masking in PoolerEndLogits
Necessary to run xlnet (at least in squad) with `--fp16 --fp16_opt_level="O2"`, otherwise loss is immediately `NaN` and fine-tuning cannot proceed.
2019-09-18 09:30:58 -04:00
15143fbad6 move run_multiple_choice.py and utils_multiple_choice.py to examples 2019-09-18 21:18:46 +08:00
3cd6289758 Merge remote-tracking branch 'huggingface/master' into run_multiple_choice_merge
# Conflicts:
#	examples/contrib/run_swag.py
2019-09-18 21:16:59 +08:00
36362cf086 move schedule.step after optimizer.step 2019-09-18 21:13:40 +08:00
3a527fa820 OpenAI GPT tests ok 2019-09-18 14:15:48 +02:00
556442afb3 hot fix 2019-09-18 14:12:41 +02:00
160b5d6080 fix xlm lang_embeddings loading 2019-09-18 14:10:20 +02:00
26497d1199 fix tests 2019-09-18 12:17:21 +02:00
6a083fd447 update pt-tf conversion script 2019-09-18 12:11:32 +02:00
f6969cc12b upgrade max model difference to 2e-2 (for transfo-xl adaptive softmax + inputs) 2019-09-18 11:12:02 +02:00
e768f2322a update run_openai_gpt to fix #1264 2019-09-18 10:07:47 +02:00
8334993915 clean up examples - updated to new keyword inputs - #1246 2019-09-18 10:01:27 +02:00
62760baf46 tiny fixes 2019-09-17 18:29:15 -04:00
45de034bf8 fix #1223 2019-09-17 10:25:06 +02:00
5a81e79e25 Merge pull request #2 from erenup/run_multiple_choice_add_doc
Run multiple choice add doc
2019-09-16 22:39:54 +08:00
5882c442e5 add example usage 2019-09-16 22:38:08 +08:00
a9debaca3d fixed init_weight 2019-09-16 19:55:24 +08:00
c88f05163d fix typo in XLM models 2019-09-16 13:42:20 +02:00
982f181aa7 Merge remote-tracking branch 'origin/master' into run_multiple_choice_add_doc 2019-09-16 19:12:00 +08:00
84b9d1c423 Merge remote-tracking branch 'huggingface/master'
# Conflicts:
#	pytorch_transformers/__init__.py
2019-09-16 19:06:12 +08:00
603b470a3d add warnning info 2019-09-16 18:53:37 +08:00
4812a5a767 add doc string 2019-09-16 11:50:18 +08:00
4b956b2a6b add layer_norm_epsilon configuration for transformer xl 2019-09-13 17:09:20 +02:00
b97af8cce9 skip finetuned checkpoints 2019-09-13 16:43:49 +02:00
65c49bb27e adding TF 2.0 adaptive softmax with logits + loss outputs 2019-09-13 15:50:51 +02:00
39c38b2ea0 fix 2019-09-12 16:47:11 +02:00
dcddf498c8 fix bert layernorm 2019-09-12 16:46:32 +02:00
d3a3a0353c clean up cache after conversion 2019-09-12 16:42:52 +02:00
a84adddd1b convert all models 2019-09-12 13:14:07 +02:00
32e1332acf [distil] fix once for all general logger for scripts 2019-09-11 14:19:07 +00:00
b62abe87c9 Merge pull request #1249 from ziliwang/master
fixed: hard coding for max and min number will out of range in fp16, which will cause nan.
2019-09-11 15:53:28 +02:00
969d3ae95e XLMWithLMHead fixed - standardize conversion 2019-09-11 15:47:33 +02:00
646711e1e2 standardize scopes names - add conversion methods 2019-09-11 15:34:17 +02:00
4356f791a2 XLM passing tests 2019-09-11 11:49:54 +02:00
11ac4b9555 [CI] Symbolic link for documentation 2019-09-11 10:13:44 +02:00
8bdee1cb73 fixed: hard coding for max and min number will out of range in fp16, which will cause nan. 2019-09-11 15:41:53 +08:00
7424b2848f Merge pull request #1 from huggingface/master
merege from original repo
2019-09-11 11:02:23 +08:00
364920e216 fix small bug/typo 2019-09-10 21:45:01 +00:00
23c23f5399 Merge pull request #1229 from SKRohit/master
changes in evaluate function in run_lm_finetuning.py
2019-09-10 22:16:45 +02:00
99a54ac51c Merge pull request #1233 from searchivarius/master
Fix to prevent crashing on assert len(tokens_b)>=1
2019-09-10 22:15:47 +02:00
439b37b474 Merge pull request #1241 from mattolson93/patch-1
Fixing typo in gpt2 for doc site's class link
2019-09-10 22:14:18 +02:00
f2cf6ce4a9 Fixing typo in gpt2 for doc site's class link 2019-09-10 09:12:01 -07:00
465870c33f Xlnet working - also added simple question answering model for XLNet 2019-09-10 16:44:41 +02:00
16b6361792 xlnet paassing first test 2019-09-10 12:39:27 +02:00
32aabe8c33 WIP XLNet 2019-09-10 12:17:18 +02:00
2c177a87eb Merge pull request #1228 from huggingface/head-masking-test
Trying to fix the head masking test
2019-09-10 11:55:27 +02:00
f851fb55ca fixing error message 2019-09-10 09:24:08 +02:00
eab980fd68 Fix to prevent crashing on assert len(tokens_b)>=1 2019-09-09 19:58:08 -04:00
a95ced6260 [Distillation] save last chkpt as pytorch_model.bin 2019-09-09 19:53:35 +00:00
50c6bc4195 fix tf bert model 2019-09-09 17:46:01 +02:00
4b082bd4d8 Merge pull request #1 from SKRohit/SKRohit-patch-1
changes in return statement of evaluate function
2019-09-09 19:59:27 +05:30
e5df36397b changes in return statement of evaluate function
changed `results` to `result` and removed `results` dict defined previously
2019-09-09 19:55:57 +05:30
0537139b2b removing tf.function 2019-09-09 14:47:31 +02:00
84d346b687 Merge pull request #1195 from huggingface/reorder_arguments
[2.0] Reodering arguments for torch jit #1010 and future TF2.0 compatibility
2019-09-09 15:42:51 +03:00
3f05de6dde Merge branch 'master' into reorder_arguments 2019-09-09 15:42:25 +03:00
33cb00f41a add GPT2 to init - fix weights loading - remove tf.function 2019-09-09 14:29:24 +02:00
78b2a53f10 debug file download in tests error 2019-09-09 13:38:10 +02:00
6b3438df21 fixing GPT2 double head model and updating the torch version tests 2019-09-09 12:48:36 +02:00
e360037236 Merge branch 'tf2' of https://github.com/huggingface/pytorch-transformers into tf2 2019-09-09 11:08:49 +02:00
b7175a2701 fixed imports in tests and gpt2 config test 2019-09-09 11:04:03 +02:00
995e38b7af Merge pull request #1214 from huggingface/new-examples
Better examples
2019-09-09 10:26:36 +03:00
3401980fc4 fix #1208 2019-09-09 10:22:12 +03:00
728637356c WIP GPT2 2019-09-09 10:18:55 +03:00
34f28b2a13 WIP GPT2 2019-09-08 15:02:06 +03:00
ad88563bda WIP GPT-2 2019-09-08 15:02:06 +03:00
64d83c7ae0 WIP 2019-09-08 15:02:06 +03:00
01597e5b90 add tf auto models + tests 2019-09-08 15:02:06 +03:00
f5c698b21a add weights tying, attention and hidden states output tests 2019-09-08 15:02:06 +03:00
6dc4b6f34c skip transfo-xl tokenizer tests with tf for now 2019-09-08 15:02:06 +03:00
e30579f764 no pytest version checking 2019-09-08 15:02:06 +03:00
518307dfcd test suite independent of framework 2019-09-08 15:02:06 +03:00
9d0a11a68c update dependencies and circle-ci 2019-09-08 15:02:06 +03:00
24a20483f5 update conversion script names 2019-09-08 15:02:06 +03:00
6f152572cd add conversion script, rename conversion scripts 2019-09-08 15:02:06 +03:00
a4704b1263 skipping tf tests if tf is not installed 2019-09-08 15:02:06 +03:00
ad0ab9afe9 fix test when tf is not here 2019-09-08 15:02:06 +03:00
59fe641b8b also gathering file names in file_utils 2019-09-08 15:02:06 +03:00
d68a8fe462 add tf bert files 2019-09-08 15:02:06 +03:00
7ae642b72d update conversion scripts 2019-09-08 15:02:06 +03:00
69bff89935 clean ups 2019-09-08 15:02:06 +03:00
1efb1f1660 split configuration and modeling files 2019-09-08 15:02:06 +03:00
1eb125fb95 be sure we have uint8 2019-09-08 15:02:06 +03:00
3f91338be9 Patched a few outdated parameters 2019-09-06 17:48:06 -04:00
f47f9a5874 Updated outdated examples 2019-09-06 17:10:33 -04:00
ee027c89f2 fix #1165 2019-09-06 23:40:05 +03:00
e52737d5ad Updated docs README to feature the examples symlink 2019-09-06 12:13:31 -04:00
5e151f5e77 Table of contents 2019-09-06 12:08:36 -04:00
593c070435 Better examples 2019-09-06 12:00:12 -04:00
5ac8b62265 Merge pull request #1205 from maru0kun/patch-2
Fix typo
2019-09-05 21:44:16 +02:00
5c6cac102b adding test for common properties and cleaning up a bit base class 2019-09-05 21:31:29 +02:00
ed717635ff Merge pull request #1201 from huggingface/configuration_refactoring
[2.0] - Split configuration and modeling files
2019-09-05 21:16:58 +02:00
04b50cabf6 gitignore 2019-09-05 18:49:28 +00:00
dddd6b9927 Update DistilBERT training code 2019-09-05 18:26:14 +00:00
f9453d15e5 Fix broken link 2019-09-05 12:35:22 -04:00
f7ee2e5d20 [README] link to Write With Transformer 2019-09-05 12:33:46 -04:00
d737947725 Fix typo 2019-09-05 19:24:57 +09:00
705237b4ec add tf auto models + tests 2019-09-05 12:21:08 +02:00
600a42329b add weights tying, attention and hidden states output tests 2019-09-05 12:02:14 +02:00
04d2006f28 skip transfo-xl tokenizer tests with tf for now 2019-09-05 11:22:13 +02:00
7f6a0c0d69 no pytest version checking 2019-09-05 11:20:56 +02:00
7c0baf9521 test suite independent of framework 2019-09-05 11:18:55 +02:00
7775a3d2ed update dependencies and circle-ci 2019-09-05 10:23:04 +02:00
33dd59e971 update conversion script names 2019-09-05 03:13:26 +02:00
5951d86024 add conversion script, rename conversion scripts 2019-09-05 03:10:11 +02:00
aa4c8804f2 skipping tf tests if tf is not installed 2019-09-05 03:06:09 +02:00
134847db81 fix test when tf is not here 2019-09-05 02:53:52 +02:00
981f7f5253 Merge branch 'tf2' of https://github.com/huggingface/pytorch-transformers into tf2 2019-09-05 02:34:52 +02:00
bffd17a43d add tf bert files 2019-09-05 02:34:44 +02:00
85df4f7cca also gathering file names in file_utils 2019-09-05 02:34:09 +02:00
11fae9e636 add tf bert files 2019-09-05 02:27:39 +02:00
121f88cae3 update conversion scripts 2019-09-05 02:17:50 +02:00
d77abd4d08 clean ups 2019-09-05 00:41:24 +02:00
2a667b1eb9 split configuration and modeling files 2019-09-05 00:27:11 +02:00
0be6a2a624 be sure we have uint8 2019-09-04 22:47:38 +02:00
7fba47b7d9 WIP reordering 2019-09-04 22:39:23 +02:00
e25cba78cf WIP reodering arguments for torchscript and TF 2019-09-04 22:39:23 +02:00
38b79b5a63 Fixing this TransformerXL bool issue 2019-09-04 22:36:30 +02:00
0b52642d37 1.2.0 in docs 2019-09-04 11:03:32 -04:00
89fd3450a6 Release: 1.2.0 2019-09-04 13:32:18 +02:00
9fd6e7ab9f Merge pull request #1190 from shijie-wu/xlm-tokenization
Fix reference of import in XLM tokenization
2019-09-04 12:50:49 +02:00
a15562e170 Fix reference of import when called for the second time 2019-09-03 18:27:29 -07:00
0287d264e9 Merge pull request #1162 from huggingface/xlnet-bias
XLNet bias fix on resize embeddings (cf #1124)
2019-09-02 23:14:04 +02:00
7f522437bc Updated documentation for LM finetuning script 2019-09-02 13:40:25 -04:00
3fbf301bba [CI] Updated resource size for python 3 tests 2019-09-02 12:35:14 -04:00
2dcc5a1629 [doc] Add blurb about large-scale model downloads
cc @n1t0 @lysandrejik @thomwolf
2019-09-02 12:27:11 -04:00
7b0c99add9 Merge pull request #1174 from huggingface/fix_byte_level_added_tokens
Fix byte-level BPE decoding error when using added tokens
2019-09-02 09:01:16 +02:00
31d3373bc9 Appends space before special token 2019-09-01 21:07:00 -04:00
fede4ef45d fixing #1133 2019-09-02 02:27:39 +02:00
b6cd856b08 Merge pull request #1164 from stefan-it/master
distillation: fix ModuleNotFoundError error in token counts script
2019-09-02 02:00:07 +02:00
ff7368eb6b Merge pull request #1077 from huggingface/pruning-save-and-load
Pruning changes so that deleted heads are kept on save/load
2019-09-01 09:42:15 +02:00
6ae0bb5291 XLM 100 different URLs 2019-08-31 14:46:31 -04:00
819b468f70 Fixed XLM model url 2019-08-31 14:40:51 -04:00
58b59a0c31 Random seed is accessible anywhere within the common tests 2019-08-31 13:17:08 -04:00
a1c34bd286 distillation: fix ModuleNotFoundError error in token counts script 2019-08-31 12:21:38 +02:00
ea86bef545 Check for None 2019-08-31 00:56:22 -04:00
e0f867a9ba XLNet bias fix on resize embeddings (cf #1124) 2019-08-31 00:50:59 -04:00
11600edc6e Rebase on master + DistilBERT head pruning patch 2019-08-31 00:37:41 -04:00
b6992b7b47 Applied patch to OpenAI GPT, RoBERTa, TransfoL, XLM and XLNet 2019-08-31 00:33:50 -04:00
bdb4409ed8 updated pruning logic with sets - Bert and GPT-2 2019-08-31 00:33:50 -04:00
0c8e823b03 Added patch to remaining models 2019-08-31 00:33:50 -04:00
0cd283522a Attempt to fix head index 2019-08-31 00:33:50 -04:00
c85b5db61a Conditional append/init + fixed warning 2019-08-31 00:33:50 -04:00
5c2b94c82a Changed string so that Circle CI accepts the warning 2019-08-31 00:33:50 -04:00
87747518e9 Blocks deletion from already deleted heads. Necessary integration test.
Now raises a warning when a head to be deleted already has been deleted. An integration test verifying the total pipeline (-> from config -> save model -> load model -> additional head pruning) has been added.
2019-08-31 00:33:50 -04:00
719cb3738d Pruning for GPT and GPT-2 2019-08-31 00:33:50 -04:00
fc1fbae45d XLM can be pruned 2019-08-31 00:33:50 -04:00
42e00cf9e1 Pruning saved to configuration first try 2019-08-31 00:33:50 -04:00
d7a4c3252e Fixed filename 2019-08-31 00:08:56 -04:00
7f006cdd87 Set seed for head_masking test 2019-08-30 23:58:49 -04:00
0fd0b674e6 [ci] legible output [skip ci] 2019-08-30 20:36:26 -04:00
b65a994f59 [ci] decrease parallelism to increase success prob 2019-08-30 20:33:16 -04:00
1d438f15b3 [XLNet] Use pytorch's layernorm like in BERT
See #1089

cc @thomwolf @lysandrejik

Also @dhpollack
2019-08-30 20:20:15 -04:00
574c5b3a72 [RoBERTa] LayerNorm's eps is not a nn.Parameter so there's no point setting it on the model
Instead we correctly store it on the config

(regenerating the hosted config files)

cc @lysandrejik
2019-08-30 20:09:24 -04:00
09363f2a8b Fix documentation index 2019-08-30 19:48:32 -04:00
51e980ce36 Merge pull request #1155 from anhnt170489/apex_fp16
Update apex fp16 implementation
2019-08-30 23:29:11 +02:00
206c35e9a4 Merge pull request #1154 from ziliwang/master
fix: hard coding for max number
2019-08-30 23:23:08 +02:00
f3d18c71ec Merge pull request #1152 from epwalsh/fix-special-tokens
fix adding special tokens
2019-08-30 23:21:59 +02:00
d483cd8e46 Merge pull request #1074 from huggingface/improved_testing
Shortcut to special tokens' ids - fix GPT2 & RoBERTa tokenizers - improved testing for GPT/GPT-2
2019-08-30 23:18:58 +02:00
d2f21f08f5 Merge pull request #1092 from shijie-wu/xlm-tokenization
Added cleaned configuration properties for tokenizer with serialization - improve tokenization of XLM
2019-08-30 23:15:40 +02:00
12b9cc9e26 Merge pull request #1110 from huggingface/automodels
Torch.hub now based on AutoModels - Updating AutoModels with AutoModelWithLMHead, Sequence Classification and Question Answering
2019-08-30 23:08:57 +02:00
bfe93a5a21 fix distilbert in auto tokenizer 2019-08-30 22:43:26 +02:00
256086bc69 clean up and simplify hubconf 2019-08-30 22:34:23 +02:00
80aa87d9a3 fix distilbert tokenizer 2019-08-30 22:24:23 +02:00
455a4c842c add distilbert tokenizer 2019-08-30 22:20:51 +02:00
7a1f174a9d update names of torch.hub to simpler names - update docstring 2019-08-30 22:20:44 +02:00
c665e0fcfe Merge branch 'automodels' of https://github.com/huggingface/pytorch-transformers into automodels 2019-08-30 21:53:36 +02:00
9b6e3b34d9 Docstrings 2019-08-30 14:09:02 -04:00
dec8f4d6fd Added DistilBERT models to all other AutoModels. 2019-08-30 13:52:18 -04:00
bc29aa67a9 HubConf configuration 2019-08-30 12:48:55 -04:00
f35f612280 updating docstring for AutoModel 2019-08-30 12:48:55 -04:00
7ca9653852 Pytorch Hub & AutoModels 2019-08-30 12:48:55 -04:00
25e8389439 Tests for added AutoModels 2019-08-30 12:48:55 -04:00
dc43215c01 Added multiple AutoModel classes: AutoModelWithLMHead, AutoModelForQuestionAnswering and AutoModelForSequenceClassification 2019-08-30 12:48:55 -04:00
282c276e09 typos + file name coherence in distillation README 2019-08-30 12:02:29 -04:00
803c1cc4ea fix relative import bug cf Issue #1140 2019-08-30 12:01:27 -04:00
7044ed6b05 fix tokenizers serialization 2019-08-30 17:36:11 +02:00
cd65c41a83 Merge branch 'master' into xlm-tokenization 2019-08-30 17:15:16 +02:00
69da972ace added test and debug tokenizer configuration serialization 2019-08-30 17:09:36 +02:00
88111de07c saving and reloading tokenizer configurations 2019-08-30 16:55:48 +02:00
b66e9b4433 Merge pull request #1158 from rabeehk/master
regarding #1026 pull request
2019-08-30 16:30:33 +02:00
0a2fecdf90 Merge branch 'master' into master 2019-08-30 16:30:08 +02:00
3871b8a107 adding xlm 17 and 100 models and config on aws 2019-08-30 16:28:42 +02:00
8678ff8df5 adding 17 and 100 xlm models 2019-08-30 16:26:04 +02:00
e0caab0cf0 fix link 2019-08-30 10:09:17 -04:00
a600b30cc3 Fix index number in documentation 2019-08-30 10:08:14 -04:00
20c06fa37d Added DistilBERT to documentation index 2019-08-30 10:06:51 -04:00
39eb31e11e remove reloading tokenizer in the training, adding it to the evaluation part 2019-08-30 15:44:41 +02:00
350bb6bffa updated tokenizer loading for addressing reproducibility issues 2019-08-30 15:34:28 +02:00
82462c5cba Added option to setup pretrained tokenizer arguments 2019-08-30 15:30:41 +02:00
41f35d0b3d Merge pull request #1089 from dhpollack/dhp/use_pytorch_layernorm
change layernorm code to pytorch's native layer norm
2019-08-30 14:49:08 +02:00
01ad55f8cf Merge pull request #1026 from rabeehk/master
loads the tokenizer for each checkpoint, to solve the reproducability…
2019-08-30 14:15:36 +02:00
50e615f43d Merge branch 'master' into improved_testing 2019-08-30 13:40:35 +02:00
f8aace6bcd update tokenizers to use self.XX_token_id instead of converting self.XX_token 2019-08-30 13:39:52 +02:00
8faf2e086b more doc on special tokens 2019-08-30 13:36:22 +02:00
f7978490b2 Merge pull request #1148 from huggingface/circleci
Documentation auto-deploy
2019-08-30 13:28:16 +02:00
ce5ef4b35d python2 doesn't spark joy 2019-08-30 13:22:43 +02:00
5dd7b677ad clean up all byte-level bpe tests 2019-08-30 12:43:08 +02:00
ca1a00a302 fix for python2 2019-08-30 12:29:31 +02:00
4e6a3172ce update roberta docstring as well 2019-08-30 12:23:37 +02:00
fd10d79b55 update GPT2 docstring 2019-08-30 12:23:12 +02:00
abe734ca1f fix GPT-2 and RoBERTa tests to be clean now 2019-08-30 12:20:18 +02:00
0f5a799456 fix GPT2DoubleHeadModel docstring 2019-08-30 11:49:23 +02:00
d51f72d5de adding shortcut to the ids of all the special tokens 2019-08-30 11:41:11 +02:00
306af132d7 update readme to mention add_special_tokens more clearly in example 2019-08-30 11:30:51 +02:00
50e6daf83a fix Roberta tokenizer __init__ 2019-08-30 11:27:43 +02:00
0517e7a1cb Fix GPT2 and RoBERTa tokenizer to beging with a space - update Roberta tokenizer 2019-08-30 11:23:49 +02:00
6e1ac34e2b Merge remote-tracking branch 'huggingface/master' 2019-08-30 15:50:11 +08:00
2fb9a934b4 re-format 2019-08-30 14:05:28 +09:00
c8731b9583 update apex fp16 implementation 2019-08-30 13:54:00 +09:00
6060b2f89b fix: hard coding for max number
fp16 max number is 65504, the original 1e30 will cause Nan in fp16
2019-08-30 12:13:47 +08:00
07e21307b6 fix adding special tokens 2019-08-29 13:44:50 -07:00
caf1d116a6 Closing bracket in DistilBERT's token count. 2019-08-29 15:30:10 -04:00
e7fba4bef5 Documentation auto-deploy 2019-08-29 12:14:29 -04:00
fe8fb10b44 Small modification of comment in the run_glue.py example
Add RoBERTa to the comment as it was not explicit that RoBERTa don't use token_type_ids.
2019-08-29 14:43:30 +02:00
2a2832ce73 Merge pull request #1 from erenup/run_multiple_choice
roberta, xlnet for multiple choice
2019-08-29 16:27:44 +08:00
942d3f4b20 modifiy code of arc label insurance 2019-08-29 10:21:17 +08:00
bf3dc778b8 Changed learning rate for run_squad test 2019-08-28 18:24:43 -04:00
0a74c88ac6 fix #1131 2019-08-28 22:41:42 +02:00
5f297c7be3 Merge pull request #1087 from huggingface/fix-warnings
Decode now calls private property instead of public method
2019-08-28 22:22:11 +02:00
d9847678b3 Merge pull request #1136 from adai183/update_SQuAD_script
swap order of optimizer.step() and scheduler.step()
2019-08-28 22:00:52 +02:00
0f8ad89206 Merge pull request #1135 from stefan-it/master
distilbert: fix number of hidden_size
2019-08-28 22:00:12 +02:00
9ce42dc540 Pretrained models table fix 2019-08-28 13:56:28 -04:00
1d15a7f278 swap order of optimizer.step() and scheduler.step() 2019-08-28 19:18:27 +02:00
ed2ab1c220 distilbert: fix number of hidden_size 2019-08-28 18:08:16 +02:00
0ecfd17f49 Merge pull request #987 from huggingface/generative-finetuning
Generative finetuning
2019-08-28 16:51:50 +02:00
50792dbdcc Merge pull request #1127 from huggingface/dilbert
DilBERT
2019-08-28 16:43:09 +02:00
e7706f514b update again 2019-08-28 16:37:22 +02:00
b5eb283aaa update credits 2019-08-28 16:36:55 +02:00
f753d4e32b Removed typings for Python 2 2019-08-28 10:15:02 -04:00
75bc2a03cc Updated article link 2019-08-28 10:05:15 -04:00
1dc43e56c9 Documentation additions 2019-08-28 09:37:27 -04:00
912a377e90 dilbert -> distilbert 2019-08-28 13:59:42 +02:00
c9bce1811c fixing model to add torchscript, embedding resizing, head pruning and masking + tests 2019-08-28 13:22:45 +02:00
62df4ba59a add dilbert tokenizer and tests 2019-08-28 12:22:56 +02:00
4ce5f36f78 update readmes 2019-08-28 12:14:31 +02:00
ec4b1c659f logging truth error 2019-08-28 16:50:40 +08:00
df52abe373 add sep_toekn between question and choice 2019-08-28 16:36:21 +08:00
43c243254a avoid invalid labels of truth 2019-08-28 16:03:17 +08:00
3c7e676f8b add test related code: test the best dev acc model when model is training 2019-08-28 15:57:29 +08:00
a5fe16687b fix typo 2019-08-28 07:22:54 +00:00
497f73c964 add DilBERT to master REAME 2019-08-28 07:16:30 +00:00
93e82ab424 Write README for DilBERT 2019-08-28 06:26:09 +00:00
19b7c9b0b7 add DilBert model for squad 2019-08-28 06:25:44 +00:00
fea921d382 add licensing 2019-08-28 04:45:39 +00:00
da1e4e53fc some fixes in train.py for loading previous checkpoint 2019-08-28 04:01:03 +00:00
0d8f8848d5 add scripts/extract_for_distil.py 2019-08-28 04:00:19 +00:00
7f2c384c80 add scripts/token_counts.py 2019-08-28 04:00:03 +00:00
4d16b279e5 add scripts/binarized_data.py 2019-08-28 03:59:48 +00:00
c513415b19 Dilbert tests from CommonTests 2019-08-27 23:59:00 -04:00
778a263f09 GilBert added to AutoModels 2019-08-27 23:14:00 -04:00
74d78beeb4 fix: add qa_dropout and seq_classif_dropout 2019-08-28 03:13:11 +00:00
7f5d85347e fix small typo 2019-08-28 02:44:51 +00:00
906581ae3c add s3 links for dilbert (+fix small typo) 2019-08-28 02:43:33 +00:00
b247b0d880 add train.py for distillation 2019-08-28 02:12:47 +00:00
780f183e55 add requirements 2019-08-28 01:39:52 +00:00
e424d2e45d add README 2019-08-28 01:10:10 +00:00
1ae81e4aa1 add dataset. distiller, utils 2019-08-28 01:10:05 +00:00
5d29f8e99b fix bugs 2019-08-28 00:57:16 +00:00
a8ad83040d fix bugs 2019-08-28 00:45:33 +00:00
ca4baf8ca1 Match order of casing in OSS XLM; Improve document; Clean up dependency 2019-08-27 20:03:18 -04:00
60c984da6c fix bugs 2019-08-27 22:25:55 +00:00
42968138c8 wip wouf 2019-08-27 22:00:38 +00:00
1d23240068 wip 2019-08-27 14:27:47 +00:00
d06c5a2a0a Merge pull request #1120 from CrafterKolyan/patch-3
Change attention mask dtype to be bool. Fix #1119
2019-08-27 15:01:01 +02:00
edc5222fc3 Merge pull request #1118 from CrafterKolyan/patch-2
Documentation fix #1117
2019-08-27 14:58:50 +02:00
9cf298dfc1 Merge pull request #1116 from CrafterKolyan/patch-1
Delete nonexistent parameter from documentation fix #1115
2019-08-27 14:56:43 +02:00
0d288727b8 fix #1106 2019-08-27 14:50:22 +02:00
447afe9cdf updating docstring for AutoModel 2019-08-27 14:42:03 +02:00
a175a9dc01 add kwargs to base encode function 2019-08-27 14:05:59 +02:00
53282b5bd0 Change attention mask dtype to be bool. Fix #1119 2019-08-27 14:19:03 +03:00
26bda77225 Fix documentation #1117
Rename parameter in documentation + Delete its second occurrence.
2019-08-27 12:22:42 +03:00
c8933bb2d9 Delete nonexistent parameter from documentation
Changed documentation of GPT2Model, GPT2LMHeadModel and GPT2DoubleHeadsModel
2019-08-27 12:10:36 +03:00
e08c01aa1a fix #1102 2019-08-26 18:13:06 -04:00
84a3a9689d Pytorch Hub & AutoModels 2019-08-26 16:08:43 -04:00
f68339639a Tests for added AutoModels 2019-08-26 16:02:23 -04:00
cb60ce59dd Added multiple AutoModel classes: AutoModelWithLMHead, AutoModelForQuestionAnswering and AutoModelForSequenceClassification 2019-08-26 15:44:30 -04:00
529a16dec6 Generic encoding implementation. 2019-08-26 15:00:43 -04:00
f1b018740c Add use_lang_emb to config 2019-08-23 20:33:01 -04:00
e85123d398 Add custom tokenizer for zh and ja 2019-08-23 20:27:52 -04:00
06510ccb53 typo 2019-08-23 22:08:10 +02:00
3bcbebd440 max_len_single_sentence & max_len_sentences_pair as attributes so they can be modified 2019-08-23 22:07:26 +02:00
436ce07218 Tokenization behave the same as original XLM proprocessing for most languages except zh, ja and th; Change API to allow specifying language in tokenize 2019-08-23 14:40:17 -04:00
ab7bd5ef98 fixing tokenization and training 2019-08-23 17:31:21 +02:00
47d6853439 adding max_lengths for single sentences and sentences pairs 2019-08-23 17:31:11 +02:00
df9d6effae Merge pull request #1081 from huggingface/fix_distributed_barrier_hang
Fix distributed barrier hang
2019-08-23 16:53:53 +02:00
3f20dd7186 Merge pull request #1075 from abhishekraok/modeling_utils_config_None
reraise EnvironmentError in modeling_utils.py
2019-08-23 12:42:39 +02:00
e13465fb8b change layernorm code to pytorch's native layer norm 2019-08-23 12:12:12 +02:00
c603d099aa reraise EnvironmentError in from_pretrained functions of Model and Tokenizer 2019-08-22 15:25:40 -07:00
2ba1a14fb0 Decode now calls private property instead of public method 2019-08-22 17:25:55 -04:00
90dcd8c05d Merge branch 'master' into generative-finetuning 2019-08-22 10:43:30 +02:00
57272d5ddf fix for glue 2019-08-22 00:25:49 -04:00
b006a7a12f fix for squad 2019-08-22 00:25:42 -04:00
14eef67eb2 Fix at config rather than model 2019-08-21 15:48:43 -07:00
296df2b18c reraise exception 2019-08-21 15:29:30 -07:00
55f69a11b6 OpenAI GPT tests now extend CommonTests 2019-08-21 18:09:25 -04:00
47267ba556 OpenAI GPT-2 now depends on CommonTests. 2019-08-21 17:50:16 -04:00
034aa0c2d7 Fixed GPT2DoubleHeadsModel example and weight tying 2019-08-21 17:27:38 -04:00
e00b4ff1de fix #1017 2019-08-21 22:22:17 +02:00
814a3f4e01 Removed attention_mask from GPT-2 and GPT documentation. Corrected multiple_choice_labels to actual name mc_labels 2019-08-21 14:11:14 -04:00
2f9397139d Added GPT-2 LARGE to Pre-trained Models documentation 2019-08-21 11:29:37 -04:00
d6bbcbc4cf Added finetuning example to documentation 2019-08-21 11:22:05 -04:00
6f877d9daf Update dev results on GLUE (bert-base-uncased) w/ median on 5 runs 2019-08-21 03:43:29 +00:00
07681b6b58 Merge pull request #1064 from huggingface/gpt-2-large
Adding gpt-2 large (774M parameters) model
2019-08-21 03:05:56 +02:00
fdc487d8b3 Add max length 2019-08-21 02:35:01 +02:00
aa05dc8935 adding gpt-2 large 2019-08-21 02:29:34 +02:00
e4515faf54 Merge pull request #1057 from huggingface/fixes
Add a few of typos corrections, bugs fixes and small improvements
2019-08-21 01:54:05 +02:00
41789c6c3d Merge pull request #1059 from GuillemGSubies/master
Better use of spacy tokenizer in open ai and xlm tokenizers
2019-08-21 01:53:48 +02:00
260c86082d Merge pull request #1027 from samvelyan/iterative_split_on_token
Re-implemented tokenize() iteratively in PreTrainedTokenizer.
2019-08-21 01:46:03 +02:00
d30cbaf5dc Merge branch 'master' into iterative_split_on_token 2019-08-21 01:33:02 +02:00
9beaa85b07 Merge pull request #1055 from qipeng/run_squad_fix
Fix #1015 (tokenizer defaults to use_lower_case=True when loading from trained models)
2019-08-21 01:20:46 +02:00
e753f249e1 Merge pull request #806 from wschin/fix-a-path
Fix a path so that a test can run on Windows
2019-08-21 01:14:40 +02:00
2d042274ac Sequence special token handling for BERT and RoBERTa 2019-08-20 14:15:28 -04:00
3bffd2e8e5 more fixes 2019-08-20 10:59:28 -07:00
c3619f5536 Merge pull request #1060 from CrafterKolyan/patch-1
Fix typo. configuratoin -> configuration
2019-08-20 17:39:06 +02:00
3b56427a1e Merge pull request #1040 from FeiWang96/multi_gpu
Fix bug of multi-gpu training in lm finetuning
2019-08-20 17:13:44 +02:00
43489756ad adding proxies options for the from_pretrained methods 2019-08-20 16:59:11 +02:00
a690edab17 various fix and clean up on run_lm_finetuning 2019-08-20 15:52:12 +02:00
ad6e62cd82 Fix typo. configuratoin -> configuration 2019-08-20 15:43:06 +03:00
388e3251fa Update tokenization_xlm.py 2019-08-20 14:19:39 +02:00
f5e2ed0fd8 Update tokenization_openai.py 2019-08-20 14:19:25 +02:00
562b998366 Update tokenization_openai.py 2019-08-20 14:10:19 +02:00
bb04446285 Update tokenization_openai.py 2019-08-20 14:07:40 +02:00
bfd75056b0 Update tokenization_xlm.py 2019-08-20 14:06:17 +02:00
fc74132598 add best steps to train 2019-08-20 19:06:41 +08:00
933841d903 Merge pull request #1056 from Morizeyao/master
Swap of optimizer.step and scheduler.step for lm finetuning examples
2019-08-20 12:42:24 +02:00
6d0aa73981 fix #1034 2019-08-20 12:20:21 +02:00
b0b9b8091b minor typo 2019-08-20 11:33:46 +02:00
53c8f700f4 fix #808 2019-08-20 11:29:26 +02:00
901dde0e45 fix #1014 2019-08-20 11:05:51 +02:00
e239a4a20f close #984 2019-08-20 11:02:00 +02:00
fecaed0ed4 add force_download option to from_pretrained methods 2019-08-20 10:56:12 +02:00
d86b49ac86 swap optimizer.step and scheduler.step 2019-08-20 16:46:34 +08:00
45ab8bf60e Revert "Update finetune_on_pregenerated.py"
This reverts commit a1359b970cb4bfa41008a45b44dd2a25e579bff3.
2019-08-20 16:40:39 +08:00
97c30b73d5 add test related code 2019-08-20 16:31:04 +08:00
d5e60e5b7a add test related code 2019-08-20 16:25:50 +08:00
a1359b970c Update finetune_on_pregenerated.py 2019-08-20 16:00:07 +08:00
28f7ca1f80 swap optimizer.step and scheduler.step 2019-08-20 15:58:42 +08:00
a368b87791 Fix #1015 2019-08-19 13:07:00 -07:00
f94f1c6016 Distributed training + tokenizer agnostic mask token 2019-08-19 14:58:50 -04:00
c589862b78 Doc: loading from config alone does not load the model weights 2019-08-19 10:17:47 -04:00
5a49b793d9 Merge pull request #1023 from tuvuumass/patch-1
fix issue #824
2019-08-19 15:31:46 +02:00
4270d3da1b fix a bug of evaluating 2019-08-19 16:38:52 +08:00
b8fde43868 a coding bug 2019-08-19 16:36:43 +08:00
40acf6b52a don't save model without training 2019-08-18 05:02:25 -04:00
47e9aea0fe add args info to evaluate_result.txt 2019-08-18 17:00:53 +08:00
5582bc4b23 add multiple choice to robreta and xlnet, test on swag, roberta=0.82.28
, xlnet=0.80
2019-08-18 16:01:48 +08:00
856a63da4d Fix: save model/model.module 2019-08-18 11:03:47 +08:00
1ef41b8337 Revert "Fix: save model/model.module"
This reverts commit 00e9c4cc9616cab1666cab0a331b5d7e68946928.
2019-08-18 11:03:12 +08:00
00e9c4cc96 Fix: save model/model.module 2019-08-18 11:02:02 +08:00
189ff9b664 Update README after RoBERTa addition 2019-08-17 13:18:37 -04:00
e384ae2b9d Merge remote-tracking branch 'huggingface/master'
merge huggingface/master to update
2019-08-17 12:05:57 +08:00
d8923270e6 Correct truncation for RoBERTa in 2-input GLUE 2019-08-16 16:30:38 -04:00
5652f54ac2 Simplified data generator + better perplexity calculator
GPT-2 now obtains ~20 perplexity on WikiText-2
2019-08-16 13:49:56 -04:00
7e7fc53da5 Fixing run_glue example with RoBERTa 2019-08-16 11:53:10 -04:00
715534800a BERT + RoBERTa masking tokens handling + GPU device update. 2019-08-16 10:10:21 -04:00
339e556feb CLM for BERT, beginning of CLM fot RoBERTa; still needs a better masking token mechanism. 2019-08-16 10:10:20 -04:00
5c18825a18 Removed dataset limit 2019-08-16 10:10:20 -04:00
3e3e145497 Added GPT to the generative fine-tuning. 2019-08-16 10:10:20 -04:00
47975ed53e Language Modeling fine-tuning using GPT-2. 2019-08-16 10:10:20 -04:00
ab05280666 Order of strings in AutoModel/AutoTokenizer updated. 2019-08-16 09:53:26 -04:00
b8ff56896c Fix bug of multi-gpu training in lm finetuning 2019-08-16 12:11:05 +08:00
9d0029e215 Added RoBERTa example to README 2019-08-15 17:17:35 -04:00
83dba0b67b Added RoBERTa tokenizer to AutoTokenizer 2019-08-15 17:07:07 -04:00
e24e19ce3b Added RoBERTa to AutoModel/AutoConfig 2019-08-15 14:02:11 -04:00
fe02e45e48 Release: 1.1.0 2019-08-15 11:15:08 -04:00
88efc65bac Merge pull request #964 from huggingface/RoBERTa
RoBERTa: model conversion, inference, tests 🔥
2019-08-15 11:11:10 -04:00
8308170156 Warning for RoBERTa sequences encoded without special tokens. 2019-08-15 10:29:04 -04:00
572dcfd1db Doc 2019-08-14 14:56:14 -04:00
c4ef103447 [RoBERTa] First 4 authors
cf. https://github.com/huggingface/pytorch-transformers/pull/964#discussion_r313574354

Co-Authored-By: Myle Ott <myleott@fb.com>
2019-08-14 12:31:09 -04:00
3d47a7f8ab loads the tokenizer for each checkpoint, to solve the reproducability issue 2019-08-14 10:58:26 +02:00
9ce36e3e4b Re-implemented tokenize() iteratively in PreTrainedTokenizer. 2019-08-14 08:57:09 +00:00
39f426be65 Added special tokens <pad> and <mask> to RoBERTa. 2019-08-13 15:19:50 -04:00
baf08ca1d4 [RoBERTa] run_glue: correct pad_token + reorder labels 2019-08-13 12:51:15 -04:00
3d87991f60 Fixed error with encoding 2019-08-13 12:00:24 -04:00
ba4bce2581 fix issue #824 2019-08-13 11:26:27 -04:00
634a3172d8 Added integration tests for sequence builders. 2019-08-12 15:14:15 -04:00
22ac004a7c Added documentation and changed parameters for special_tokens_sentences_pair. 2019-08-12 15:13:53 -04:00
912fdff899 [RoBERTa] Update run_glue for RoBERTa 2019-08-12 13:49:50 -04:00
b3d83d68db Fixup 9d0603148bc34255fad0cad73ce438ecd7306322 2019-08-12 12:28:55 -04:00
a7b4cfe919 Update README.md
I assume that it should test the `re-load` functionality after testing the `save` functionality, however I'm also surprised that nobody points this out after such a long time, so maybe I've misunderstood the purpose. This PR is just in case :)
2019-08-12 09:53:05 -04:00
b219029c45 refactoring old run_swag. This script is mainly refatored from run_squad in pytorch_transformers 2019-08-11 15:20:37 +08:00
aaedfc35a8 Merge branch 'master' of https://github.com/huggingface/pytorch-transformers 2019-08-10 20:04:37 +02:00
c683c3d5a5 fix #993 2019-08-10 20:04:35 +02:00
7060766490 Corrected logger.error info
Signed-off-by: Kevin Trebing <Kevin.Trebing@gmx.net>
2019-08-09 19:36:44 -04:00
75d5f98fd2 Roberta tokenization + fixed tests (py3 + py2). 2019-08-09 15:02:13 -04:00
14e970c271 Tokenization encode/decode class-based sequence handling 2019-08-09 15:01:38 -04:00
3566d27919 Clarified PreTrainedModel.from_pretrained warning messages in documentation. 2019-08-08 19:04:34 -04:00
fbd746bd06 Updated test architecture 2019-08-08 18:21:34 -04:00
6c41a8f5dc Encode and Decode are back in the superclass. They now handle sentence pairs special tokens. 2019-08-08 18:20:32 -04:00
e367ac469c [RoBERTa] Re-apply 39d72bcc7b2c99c04b6f483f0d8e7bdff547d37c
cc @lysandrejik
2019-08-08 11:26:11 -04:00
9d0603148b [RoBERTa] RobertaForSequenceClassification + conversion 2019-08-08 11:24:54 -04:00
f2b300df6b fix #976 2019-08-08 10:38:57 -04:00
7df303f5ad fix #971 2019-08-08 10:36:26 -04:00
d2cc6b101e Merge branch 'master' into RoBERTa 2019-08-08 09:42:05 -04:00
39d72bcc7b Fixed the RoBERTa checkpoint conversion script according to the LM head refactoring. 2019-08-07 14:21:57 -04:00
770043eea2 Sentence-pair tasks handling. Using common tests on RoBERTa. Forced push to fix indentation. 2019-08-07 12:53:19 -04:00
7729ef7381 Merge pull request #955 from FeiWang96/master
Fix comment typo
2019-08-07 10:11:25 +02:00
5c6ecf37e7 Merge pull request #958 from saket404/typo-fix
Fixed small typo
2019-08-07 10:10:20 +02:00
b4f9464f90 Merge pull request #960 from ethanjperez/patch-1
Fixing unused weight_decay argument
2019-08-07 10:09:55 +02:00
822d6768eb Merge pull request #962 from guotong1988/patch-1
Update modeling_xlnet.py
2019-08-07 10:09:20 +02:00
7e6102ce74 Merge pull request #963 from guotong1988/patch-2
Update modeling_bert.py
2019-08-07 10:09:04 +02:00
3773ba44f0 Merge pull request #977 from chrisgzf/master
Fixed typo in migration guide
2019-08-07 10:08:45 +02:00
a80aa03bda Merge pull request #973 from FeiWang96/bert_config
Fix examples of loading pretrained models in docstring
2019-08-07 10:08:22 +02:00
a6f412da01 Fixed typo in migration guide 2019-08-07 02:19:14 +08:00
6ec1ee9ec2 Fix examples in docstring 2019-08-06 11:32:54 +08:00
72622926e5 Fix examples in docstring 2019-08-06 11:32:41 +08:00
f889e77b9c Fix examples of loading pretrained models in docstring 2019-08-06 11:30:35 +08:00
beb03ec6c5 Fix examples of loading pretrained models in docstring 2019-08-06 11:24:46 +08:00
4fc9f9ef54 Merge pull request #910 from huggingface/auto_models
Adding AutoTokenizer and AutoModel classes that automatically detect architecture - Clean up tokenizers
2019-08-05 19:17:47 +02:00
d43dc48b34 Merge branch 'master' into auto_models 2019-08-05 19:17:35 +02:00
0b524b0848 remove derived classes for now 2019-08-05 19:08:19 +02:00
13936a9621 update doc and tests 2019-08-05 18:48:16 +02:00
ed4e542260 adding tests 2019-08-05 18:14:07 +02:00
3a126e73dd fix #950 2019-08-05 17:26:29 +02:00
7223886dc9 fix #944 2019-08-05 17:16:56 +02:00
70c10caa06 add option mentioned in #940 2019-08-05 17:09:37 +02:00
077ad693e9 tweak issue templates wordings 2019-08-05 16:46:29 +02:00
02d4087cb8 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-08-05 16:26:01 +02:00
7c524d631e add issue templates 2019-08-05 16:25:54 +02:00
6f05ad72b4 Merge pull request #791 from huggingface/doc
RestructuredText table for pretrained models.
2019-08-05 10:18:00 -04:00
b90e29d52c working on automodels 2019-08-05 16:06:34 +02:00
58830807d1 inidicate we only support pytorch 1.0.0+ now 2019-08-05 14:38:59 +02:00
328afb7097 cleaning up tokenizer tests structure (at last) - last remaining ppb refs 2019-08-05 14:08:56 +02:00
0e918707dc Merge pull request #907 from dhpollack/fix_convert_to_tf
Fix convert to tf
2019-08-05 12:55:04 +02:00
cb9db101c7 Python 2 must DIE 2019-08-04 22:04:15 -04:00
05c083520a [RoBERTa] model conversion, inference, tests 🔥 2019-08-04 21:39:21 -04:00
d7fd10568c Update modeling_bert.py 2019-08-05 08:58:19 +08:00
84eb699082 Update modeling_xlnet.py 2019-08-05 08:57:09 +08:00
00132b7a7a updating docs - adding few tests to tokenizers 2019-08-04 22:42:55 +02:00
28ba345ecc Fixing unused weight_decay argument
Currently the L2 regularization is hard-coded to "0.01", even though there is a --weight_decay flag implemented (that is unused). I'm making this flag control the weight decay used for fine-tuning in this script.
2019-08-04 12:31:46 -04:00
009273dbdd big doc update [WIP] 2019-08-04 12:14:57 +02:00
836e513698 Fixed small typo 2019-08-04 16:05:10 +10:00
a24f830604 Fix comment typo 2019-08-03 12:17:06 +08:00
44dd941efb link to swift-coreml-transformers 2019-08-01 09:50:30 -04:00
f2a3eb987e Fix small typos 2019-07-31 11:05:06 -04:00
97091acb8c Small spelling fix 2019-07-31 10:37:56 -04:00
769bb643ce Fixing a broken link. 2019-07-31 10:22:41 -04:00
c90119e543 spelling mistake 2019-07-29 16:56:02 +02:00
bfbe52ec39 cleaning up example docstrings 2019-07-27 20:25:39 +02:00
4cc1bf81ee typos 2019-07-27 12:08:21 +02:00
ac27548b25 fix unk_token test 2019-07-27 11:50:47 +02:00
c717d38573 dictionnary => dictionary 2019-07-26 23:30:48 +02:00
6b763d04a9 Merge pull request #911 from huggingface/small_fixes
Small fixes
2019-07-26 21:36:21 +02:00
7b6e474c9a fix #901 2019-07-26 21:26:44 +02:00
632d711411 fix #908 2019-07-26 21:14:37 +02:00
c054b5ee64 Merge pull request #896 from zijunsun/master
fix multi-gpu training bug when using fp16
2019-07-26 19:31:02 +02:00
27b0f86d36 clean up pretrained 2019-07-26 17:09:21 +02:00
57e54ec070 add unk_token to gpt2 2019-07-26 17:09:07 +02:00
ac42049c08 add auto models and auto tokenizer 2019-07-26 17:08:59 +02:00
09ecf225e9 fixed the fix. tf session madness. 2019-07-26 15:20:44 +02:00
edfd965ac8 fix convert_to_tf 2019-07-26 14:13:46 +02:00
f0aeb7a814 multi-gpu training also should be after apex fp16(squad) 2019-07-26 15:23:29 +08:00
46cc9dd2b5 Merge pull request #899 from sukuya/master
Fixed import to use torchscript flag.
2019-07-25 15:03:21 +02:00
6219ad7216 Merge pull request #888 from rococode/patch-1
Update docs for parameter rename
2019-07-25 15:01:22 +02:00
0b6122e96a Merge pull request #882 from Liangtaiwan/squad_v1_bug
fix squad v1 error (na_prob_file should be None)
2019-07-25 14:59:59 +02:00
c244562cae Merge pull request #893 from joelgrus/patch-2
make save_pretrained do the right thing with added tokens
2019-07-25 14:58:48 +02:00
e1e2ab3482 Merge pull request #1 from sukuya/sukuya-patch-1
Update torchscript.rst
2019-07-25 16:53:11 +08:00
35c52f2f3c Update torchscript.rst
Import fixed to pytorch_transformers else torchscript flag can't be used.
2019-07-25 16:51:11 +08:00
adb3ef6368 multi-gpu training also should be after apex fp16 2019-07-25 13:09:10 +08:00
ae152cec09 make save_pretrained work with added tokens
right now it's dumping the *decoder* when it should be dumping the *encoder*. this fixes that.
2019-07-24 16:54:48 -07:00
66b15f73f0 Update docs for parameter rename
OpenAIGPTLMHeadModel now accepts `labels` instead of `lm_labels`
2019-07-24 11:27:08 -07:00
a7fce6d917 fix squad v1 error (na_prob_file should be None) 2019-07-24 16:11:36 +08:00
067923d326 Merge pull request #873 from huggingface/identity_replacement
Add nn.Identity replacement for old PyTorch
2019-07-23 18:16:35 +02:00
368670ac31 Merge pull request #866 from xanlsh/master
Rework how PreTrainedModel.from_pretrained handles its arguments
2019-07-23 18:05:30 +02:00
1383c7b87a Fix #869 2019-07-23 17:52:20 +02:00
6070b55443 fix #868 2019-07-23 17:46:01 +02:00
2c9a3115b7 fix #858 2019-07-23 16:45:55 +02:00
4fb56c7729 Remove unused *args parameter from PreTrainedConfig.from_pretrained 2019-07-23 10:43:01 -04:00
e179c55490 Add docs for from_pretrained functions, rename return_unused_args 2019-07-23 10:43:01 -04:00
fec76a481d Update readme 2019-07-23 16:05:29 +02:00
859c441776 Merge pull request #872 from huggingface/saving_schedules
Updating schedules for state_dict saving/loading
2019-07-23 16:03:06 +02:00
0740e63e49 updating schedules for state_dict saving 2019-07-23 15:57:18 +02:00
268c6cc160 Merge pull request #845 from rabeehk/master
fixed version issues in run_openai_gpt
2019-07-23 15:29:31 +02:00
1d7d01c080 Merge pull request #847 from lpq29743/master
typos
2019-07-23 15:28:31 +02:00
c4bc66886d Merge pull request #860 from Yiqing-Zhou/patch-1
read().splitlines() -> readlines()
2019-07-23 15:24:25 +02:00
ba52fe69d5 update breaking change section regarding from_pretrained keyword arguments 2019-07-23 15:10:02 +02:00
b1019d2a8e token[-1] -> token.rstrip('\n') 2019-07-23 20:41:26 +08:00
0227b4a940 fix #827 2019-07-23 14:06:43 +02:00
490ebbdcf7 Fix PretrainedModel.from_pretrained not passing cache_dir forward 2019-07-22 18:03:08 -04:00
b8009cb0da Make PreTrainedModel.from_pretrained pass unused arguments to model 2019-07-22 18:03:08 -04:00
bef0c629ca fix
Remove '\n' before adding token into vocab
2019-07-22 22:30:49 +08:00
897d0841be read().splitlines() -> readlines()
splitlines() does not work as what we expect here for bert-base-chinese because there is a '\u2028' (unicode line seperator) token in vocab file. Value of '\u2028'.splitlines() is ['', ''].
Perhaps we should use readlines() instead.
2019-07-22 20:49:09 +08:00
2f869dc665 Fixed typo 2019-07-21 11:05:36 -04:00
76be189b08 typos 2019-07-21 20:39:42 +08:00
f63ff536ad fixed version issues in run_openai_gpt 2019-07-20 12:43:07 +02:00
a615499076 Merge pull request #797 from yzy5630/fix-examples
fix some errors for distributed lm_finetuning
2019-07-18 23:32:33 +02:00
dbecfcf321 Merge pull request #815 from praateekmahajan/update-readme-link
Update Readme link for Fine Tune/Usage section
2019-07-18 18:30:32 +02:00
acc48a0cc9 typos 2019-07-18 09:54:04 -04:00
a1fe4ba9c9 use new API for save and load 2019-07-18 15:45:23 +08:00
0d46b17553 Update Readme
Incorrect link for `Quick tour: Fine-tuning/usage scripts`
2019-07-17 22:50:10 -07:00
a7ba27b1b4 add parser for adam 2019-07-18 08:52:51 +08:00
c4e9615691 Fix a path so that test can run on Windows 2019-07-17 09:08:40 -07:00
9d381e7be9 Fixed incorrect links in the PretrainedModel 2019-07-17 09:25:38 -04:00
d6522e2873 change loss and optimizer to new API 2019-07-17 21:22:34 +08:00
71d597dad0 fix #800 2019-07-17 13:51:09 +02:00
4bcddf6fc8 Merge pull request #801 from bzantium/master
import sys twice
2019-07-17 12:31:26 +02:00
506ab34d0e Merge pull request #796 from stefan-it/minor-doc-updates
Minor documentation updates
2019-07-17 12:26:34 +02:00
cd8980e1f4 import sys twice 2019-07-17 18:12:01 +09:00
123da5a2fa fix errors for lm_finetuning examples 2019-07-17 09:56:07 +08:00
60a1bdcdac fix some errors for distributed lm_finetuning 2019-07-17 09:16:20 +08:00
e6cc6d237f docs: fix link to various notebooks 2019-07-16 23:42:28 +02:00
5b78400e21 docs: fix link to modeling example source (bert) 2019-07-16 23:41:57 +02:00
61cc3ee350 docs: fix link to tf checkpoint to pytorch script 2019-07-16 23:41:04 +02:00
dbbd94cb7a docs: fix link to bertology example and update dataset description 2019-07-16 23:40:04 +02:00
5fe0b378d8 adding missing docstring fix #793 2019-07-16 21:35:53 +02:00
e848b54730 fix #792 2019-07-16 21:22:19 +02:00
c5b3d86a91 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-07-16 21:21:05 +02:00
6b70760204 typos 2019-07-16 21:21:03 +02:00
117ed92992 RestructuredText table for pretrained models. 2019-07-16 11:58:47 -04:00
b33a385091 update readme 2019-07-16 16:18:37 +02:00
ed7549bb1a release version 1.0 2019-07-16 16:10:58 +02:00
6a72d9aa52 updated examples in readme 2019-07-16 16:09:29 +02:00
b59043bf8f update readme 2019-07-16 16:03:48 +02:00
edc79acb3b simpler quick tour 2019-07-16 16:02:32 +02:00
5c82d3488f indicate default evaluation in breaking changes 2019-07-16 15:45:58 +02:00
4acaa65068 model in evaluation mode by default after from_pretrained 2019-07-16 15:41:57 +02:00
f289e6cfe4 fix docstrings 2019-07-16 15:31:21 +02:00
9726b229cf model name typo 2019-07-16 15:17:45 +02:00
1849aa7d39 update readme and pretrained model weight files 2019-07-16 15:11:29 +02:00
43e0e8fa04 updates to readme and doc 2019-07-16 13:56:47 +02:00
f31154cb9d Merge branch 'xlnet' 2019-07-16 11:51:13 +02:00
1b35d05d4b update conversion scripts and __main__ 2019-07-16 09:41:55 +02:00
352e3ff998 added migration guide to readme 2019-07-16 09:03:49 +02:00
8ad7e5b4f2 indeed 2019-07-16 00:29:15 +02:00
064d0a0b76 update readme 2019-07-16 00:21:33 +02:00
3b8b0e01bb update readme 2019-07-16 00:12:55 +02:00
76da9765b6 fix run_generation test 2019-07-15 17:52:35 +02:00
e691fc0963 update QA models tests + run_generation 2019-07-15 17:45:24 +02:00
15d8b1266c update tokenizer - update squad example for xlnet 2019-07-15 17:30:42 +02:00
3b469cb422 updating squad for compatibility with XLNet 2019-07-15 15:28:37 +02:00
8ca767f13c clean up optimization 2019-07-15 13:49:07 +02:00
74a24f0fe9 clean up file_utils 2019-07-15 13:49:01 +02:00
ab49fafc04 update tokenization docstrings for #328 2019-07-15 12:51:23 +02:00
a9ab15174c fix #328 2019-07-15 12:42:12 +02:00
f7cd7392fd fixed tests 2019-07-15 12:32:19 +02:00
e28d8bde0d doc on base classes 2019-07-15 12:08:06 +02:00
44c985facd update doc for XLM and XLNet 2019-07-15 11:36:50 +02:00
0201d86015 added doc for transformer-xl 2019-07-15 10:11:09 +02:00
4cb489457f added doc for openai GPT 2019-07-15 09:58:01 +02:00
62b8eb43c1 fix add_start_docstrings on python 2 (removed) 2019-07-15 09:49:02 +02:00
5bc3d0cc5b added gpt2 doc 2019-07-15 09:40:05 +02:00
183fedfed5 fix doc on python2 2019-07-15 09:00:09 +02:00
0e9825e252 small fix to run_glue 2019-07-14 23:43:28 +02:00
2397f958f9 updating examples and doc 2019-07-14 23:20:10 +02:00
c490f5ce87 added generation examples in tests 2019-07-13 15:26:58 +02:00
8bb02c27e2 Merge branch 'xlnet' of https://github.com/huggingface/pytorch-pretrained-BERT into xlnet 2019-07-13 15:25:06 +02:00
7d4b200e40 good quality generation example for GPT, GPT-2, Transfo-XL, XLNet 2019-07-13 15:25:03 +02:00
69dc010936 Merge pull request #786 from huggingface/doc-sphinx
New documentation for pytorch-transformers
2019-07-13 12:08:57 +02:00
7322c314a6 remove python2 testing for examples 2019-07-12 14:24:08 +02:00
936e813c84 clean up examples - added squad example and test 2019-07-12 14:16:06 +02:00
699bc7e86e fix gpt-2 unk token test 2019-07-12 11:46:57 +02:00
762ded9b1c wip examples 2019-07-12 11:28:52 +02:00
7442956361 save config file 2019-07-12 11:26:16 +02:00
292140b921 Merge pull request #781 from huggingface/embeddings
Clean up input embeddings resizing and weights tying
2019-07-12 11:10:25 +02:00
c57e9d946f Merge branch 'xlnet' into embeddings 2019-07-12 11:10:14 +02:00
2918b7d2a0 updating tests 2019-07-12 10:57:58 +02:00
3fbceed8d2 Fix layer reference loss + previous attempted fix 2019-07-11 22:29:55 -04:00
6c2ee16c04 Test suite testing the tie_weights function as well as the resize_token_embeddings function.
Patched an issue relating to the tied weights I had introduced with the TorchScript addition.
Byte order mark management in TSV glue reading.
2019-07-11 22:09:16 -04:00
3821ecbf4a Byte order mark management in TSV glue reading. 2019-07-11 20:16:28 -04:00
e3fb4310d6 From pretrained correct initialization. Unknown token handling for gpt2. 2019-07-11 18:44:29 -04:00
bd404735a7 embeddings resizing + tie_weights 2019-07-12 00:02:49 +02:00
50e62a4cb4 fix gpt/gpt-2 from pretrained 2019-07-11 16:50:21 -04:00
273617b86d update config - fix gpt/gpt-2 from pretrained 2019-07-11 22:45:03 +02:00
6b13f4cb3a update circle-ci 2019-07-11 22:36:35 +02:00
2b644785f0 add tests on examples and large circle ci config 2019-07-11 22:31:50 +02:00
c6bf1a400d fix test examples et model pretrained 2019-07-11 22:29:08 +02:00
92a782b108 fix run_glue test 2019-07-11 22:20:10 +02:00
6491575fd5 Added TorchScript disclaimer. CSS modifications. 2019-07-11 12:38:21 -04:00
ccb6947dc1 optimization tests 2019-07-11 17:39:47 +02:00
e4f9dca018 Merge pull request #773 from huggingface/doc-sphinx
Sphinx doc, XLM Checkpoints
2019-07-11 15:46:39 +02:00
b87eb82b4f Merge branch 'xlnet' into doc-sphinx 2019-07-11 15:46:27 +02:00
d216e798af Merge pull request #777 from huggingface/examples
Working GLUE Example for XLNet (STS-B)
2019-07-11 15:43:47 +02:00
6135de2fa3 readme update 2019-07-11 15:39:49 +02:00
b21d84b027 update examples 2019-07-11 15:37:34 +02:00
ec07cf5a66 rewamp optimization 2019-07-11 14:48:22 +02:00
4fef5919a5 updating examples 2019-07-11 12:03:08 +02:00
7fdbc47822 Added the two CLM XLM pretrained checkpoints.
Fixed file extensions for config/vocab/merges of XLM models.
2019-07-10 19:37:24 -04:00
dee3e45b93 Fixed XLM weights conversion script. Added 5 new checkpoints for XLM. 2019-07-10 19:04:21 -04:00
c82b74b996 Fixed Sphinx errors and warnings 2019-07-10 15:30:19 -04:00
5288913bdd All TODOs to be checked by Thom have been added. 2019-07-10 15:16:40 -04:00
f773faa258 Fixed all links. Removed TPU. Changed CLI to Converting TF models. Many minor formatting adjustments. Added "TODO Lysandre filled" where necessary. 2019-07-10 14:45:56 -04:00
50b7e52a7f WIP examples 2019-07-10 15:33:34 +02:00
3f56ad5aff Updated CircleCI's config.yml to use a large resource class. 2019-07-09 18:50:59 -04:00
c4bab2dc85 Added footer with social links. 2019-07-09 18:03:01 -04:00
331db8cc02 Added viewcode plugin for source code visualization within the static website. 2019-07-09 17:01:56 -04:00
83fb311ef7 Patched warnings + Refactored XLNet's Docstrings 2019-07-09 16:38:30 -04:00
8fe2c9d98e Refactored Docstrings of BERT, GPT2, GPT, TransfoXL, XLM and XLNet. 2019-07-09 15:55:31 -04:00
ed6c8d37f4 fix merge 2019-07-09 17:14:52 +02:00
e468192e2f Merge branch 'pytorch-transformers' into xlnet 2019-07-09 17:05:37 +02:00
4ce237c880 update run_glue 2019-07-09 17:00:32 +02:00
9dd2c86033 Merge pull request #767 from huggingface/doc
Documentation
2019-07-09 16:56:34 +02:00
e0e5c7faf5 Added requirements.txt file. 2019-07-09 10:16:09 -04:00
3b7cb7bf44 small update to run_glue 2019-07-09 16:12:15 +02:00
269e73b601 Adding example detailing how to add a new file to the documentation + adding fonts. 2019-07-09 10:11:29 -04:00
d743f2f34e updating test 2019-07-09 15:58:58 +02:00
d0efbd3cd1 update sequencesummary module 2019-07-09 15:46:43 +02:00
d5481cbe1b adding tests to examples - updating summary module - coverage update 2019-07-09 15:29:42 +02:00
c079d7ddff fix python 2 tests 2019-07-09 10:40:59 +02:00
b19786985d unified tokenizer api and serialization + tests 2019-07-09 10:25:18 +02:00
6847e30e1c New page detailing the use of TorchScript. 2019-07-08 17:34:24 -04:00
ab30651802 Hugging Face theme. 2019-07-08 16:05:26 -04:00
a60ae1a505 Docstrings best practice shown in the BERT documentation. 2019-07-08 11:50:32 -04:00
64fd986376 Tokenizers and Config classes are referenced. 2019-07-05 17:44:59 -04:00
df759114c9 Single file documentation for each model, accompanied by the Documentation overview. 2019-07-05 17:35:26 -04:00
03de9686a7 Initial folder structure for the documentation. A draft of documentation change has been made in the BertModel class. 2019-07-05 17:11:13 -04:00
3d5f291386 updates to run_glue 2019-07-05 17:22:15 +02:00
99b90edab1 cleaning up run_glue example 2019-07-05 17:09:35 +02:00
1113f97f33 clean up glue example 2019-07-05 16:31:13 +02:00
162ba383b0 fix model loading 2019-07-05 15:57:14 +02:00
6dacc79d39 fix python2 tests 2019-07-05 15:11:59 +02:00
36bca545ff tokenization abstract class - tests for examples 2019-07-05 15:02:59 +02:00
a4f980547f remove circle ci parallelism 2019-07-05 12:31:34 +02:00
eb91f6437e update readme and setup 2019-07-05 12:30:15 +02:00
78462aad61 Merge pull request #733 from ceremonious/parallel-generation
Added option to use multiple workers to create training data
2019-07-05 12:04:30 +02:00
781124b0d1 Merge pull request #620 from chrislarson1/convert-back-to-tf
Convert pytorch models back to tensorflow
2019-07-05 12:01:17 +02:00
e5fe2bb5e8 Merge pull request #745 from leimao/leimao
fix evaluation bug
2019-07-05 12:00:04 +02:00
0231ba291e circle-ci 2019-07-05 11:59:04 +02:00
0bab55d5d5 [BIG] name change 2019-07-05 11:55:36 +02:00
9113b50c96 hubs [WIP] 2019-07-05 11:31:51 +02:00
175fce0a55 Merge pull request #758 from huggingface/doc
Release 0.7 - Add tokenizer API + tests
2019-07-05 11:22:03 +02:00
e75c3f70aa standardizing tokenizers API and adding tests 2019-07-05 11:20:27 +02:00
c0239e09e6 first commit 2019-07-04 17:06:30 +02:00
cf86d23eff parallelism in circlci 2019-07-04 17:02:21 +02:00
15b70338ba adding squad model to xlnet and xlm 2019-07-04 16:50:42 +02:00
fbe04423b6 Common SequenceSummary class 2019-07-04 00:25:30 +02:00
c22545aa40 fix xlm torchscript 2019-07-03 23:03:57 +02:00
3b23a846b6 Merge branch 'xlnet' of https://github.com/huggingface/pytorch-pretrained-BERT into xlnet 2019-07-03 22:54:58 +02:00
8fa3a1f0d8 updating tests 2019-07-03 22:54:53 +02:00
c41f2bad69 WIP XLM + refactoring 2019-07-03 22:54:39 +02:00
64ce4dbd86 Merge pull request #748 from huggingface/torchscript
Release 0.7 - Add Torchscript capabilities
2019-07-03 22:52:03 +02:00
b43b130f35 TorchScript flag in config; Tied weights when not running TorchScript; tuple concatenation clean-up. 2019-07-03 16:21:17 -04:00
4703148f0c TransformerXL can't be exported to TorchScript because of control-flow. Exception added to tests. 2019-07-03 14:50:23 -04:00
971c24687f XLNET can be exported to TorchScript 2019-07-03 11:03:09 -04:00
be54b16960 GPT can be exported to TorchScript 2019-07-02 18:09:45 -04:00
d8e83de792 GPT2 can be exported to TorchScript 2019-07-02 18:01:09 -04:00
288be7b7ea xlm 2019-07-02 23:42:31 +02:00
e891bb43d5 BERT can be exported to TorchScript 2019-07-02 17:23:18 -04:00
6ce1ee04fc TorchScript testing with output_attentions and output_hidden_state 2019-07-02 17:22:59 -04:00
7ed5bf706f add tests 2019-07-02 16:42:22 +02:00
708877958a updating tests and models, adding weights initialization test 2019-07-02 16:35:29 +02:00
99ae5ab883 update config tests and circle-ci 2019-07-02 12:40:39 +02:00
1484d67de9 [LARGE] updating all tests and API 2019-07-02 12:13:17 +02:00
64b2a828c0 fix evaluation bug 2019-07-01 14:56:24 -07:00
4f8b5f687c add fix for serialization of tokenizer 2019-06-29 23:35:21 +02:00
d9184620f9 fix tests and new API 2019-06-29 23:10:40 +02:00
dad3c7a485 Merge pull request #723 from tonianelope/master
Update Adam optimizer to follow pytorch convention for betas parameter (#510)
2019-06-28 17:28:25 +02:00
e296d5bef1 Merge pull request #704 from deepset-ai/master
Adjust s3 german Bert file storage
2019-06-28 17:10:58 +02:00
c68b4eceed Merge pull request #718 from Rocketknight1/master
Incorrect docstring for BertForMaskedLM
2019-06-28 17:08:51 +02:00
213981d8cb updating bert API 2019-06-28 16:45:24 +02:00
2b56e98892 standardizing API across models - XLNetForSeqClass working 2019-06-28 16:35:09 +02:00
3a00674cbf fix imports 2019-06-27 17:18:46 +02:00
d939d6fd02 fix hidden-state extraction 2019-06-27 09:39:44 +02:00
0c2ff34815 extracting double hidden-state from xlnet 2019-06-27 09:27:50 +02:00
08ff056c43 Added option to use multiple workers to create training data for lm fine tuning 2019-06-26 16:16:12 -07:00
3deea56c07 fixing loading fucntion 2019-06-26 13:41:12 +02:00
f56b8033f0 more versatile loading 2019-06-26 13:13:15 +02:00
4d47f4985d slight refactoring, add abstract class for model loading 2019-06-26 12:52:44 +02:00
59cefd4f98 fix #726 - get_lr in examples 2019-06-26 11:28:27 +02:00
ddc2cc61a6 fix python2 tests 2019-06-26 11:17:42 +02:00
7e3070ae4f add from_pretrained method to all configuration classes 2019-06-26 11:12:00 +02:00
93e9971c54 fix tests 2019-06-26 10:02:45 +02:00
092dacfd62 changing is_regression to unified API 2019-06-26 09:54:05 +02:00
e55d4c4ede various updates to conversion, models and examples 2019-06-26 00:57:53 +02:00
603c513b35 update main conversion script and readme 2019-06-25 10:45:07 +02:00
7de1740490 add ability to restore fine-tuned TF mdoel 2019-06-25 10:27:58 +02:00
c9885903a1 update betas to follow pytorch convention 2019-06-25 09:23:12 +01:00
7334bf6c21 pad on left for xlnet 2019-06-24 15:05:11 +02:00
c888663f18 overwrite output directories if needed 2019-06-24 14:38:24 +02:00
62d78aa37e updating GLUE utils for compatibility with XLNet 2019-06-24 14:36:11 +02:00
24ed0b9346 updating run_xlnet_classifier 2019-06-24 12:00:09 +02:00
f6081f2255 add xlnetforsequence classif and run_classifier example for xlnet 2019-06-24 10:01:07 +02:00
8d6a118aee Incorrect docstring for the head_mask argument to BertForMaskedLM 2019-06-23 18:47:05 +01:00
06716d7536 Merge pull request #3 from huggingface/master
Catch up with main repo
2019-06-23 18:46:03 +01:00
c946bb51a6 fix xlnet tokenizer and python2 2019-06-22 22:28:49 +02:00
98dc30b21e Merge pull request #714 from papower1/master
Correct a broken link on README
2019-06-22 21:29:41 +02:00
eae5d3819d Merge pull request #715 from Rocketknight1/master
Include a reference for LM finetuning
2019-06-22 21:29:19 +02:00
c7b2808ed7 Update LM finetuning README to include a literature reference 2019-06-22 15:04:01 +01:00
7c59e32d47 Merge pull request #2 from huggingface/master
Updating my fork to the latest version
2019-06-22 14:59:47 +01:00
ada0d8fec7 Merge pull request #1 from papower1/papower1-patch-1
Correct a broken link and its context.
2019-06-22 20:34:45 +09:00
fcc706343f Correct a broken link and its context.
Correct a broken link(run_lm_finetuning.py) and its context.
2019-06-22 20:33:48 +09:00
181075635d updating model loading and adding special tokens ids 2019-06-21 23:23:37 +02:00
ebd2cb8d74 update from_pretrained to load XLNetModel as well 2019-06-21 21:08:44 +02:00
483cbc36a9 test deviation with tf model: max ~1e-3 should be ok 2019-06-21 16:38:01 +02:00
24d8068982 weights loading script ok 2019-06-21 12:33:44 +02:00
32da75486b add tokenizer and tests 2019-06-21 11:09:51 +02:00
45709d7532 model running with simple inputs 2019-06-21 00:28:42 +02:00
b407972e27 update gitignore 2019-06-20 13:52:56 +02:00
c2ea5aef77 work in progress on xlnet 2019-06-20 13:52:21 +02:00
de713fa9b4 starting 2019-06-20 10:54:19 +02:00
c304593d8f BERTology details in readme 2019-06-20 10:05:06 +02:00
12e892e174 Merge pull request #697 from huggingface/updating_examples
Updating examples
2019-06-20 09:58:24 +02:00
411981a080 remove slow circle-ci 2019-06-20 08:54:18 +02:00
716cc1c4d9 added main() for programmatic call to convert pytorch->tf 2019-06-19 23:18:57 -04:00
a8e071c690 added notebook to check correctness of the pytorch->tensorflow conversion 2019-06-19 23:08:08 -04:00
0a4fb0da57 Merge remote-tracking branch 'upstream/master' into convert-back-to-tf
merging in latest changes from upstream
2019-06-19 22:56:20 -04:00
edfe91c36e first version bertology ok 2019-06-19 23:43:04 +02:00
7766ce66dd update bertology 2019-06-19 22:29:51 +02:00
7f00a36e27 pruning should keep on device 2019-06-19 22:23:12 +02:00
e4b46d86ce update head pruning 2019-06-19 22:16:30 +02:00
939cf29157 Adjust s3 german Bert file storage 2019-06-19 18:38:42 +02:00
0f40e8d6a6 debugger 2019-06-19 15:38:46 +02:00
0e1e8128bf more logging 2019-06-19 15:35:49 +02:00
909d4f1af2 cuda again 2019-06-19 15:32:10 +02:00
14f0e8e557 fix cuda 2019-06-19 15:29:28 +02:00
34d706a0e1 pruning in bertology 2019-06-19 15:25:49 +02:00
dc8e0019b7 updating examples 2019-06-19 13:23:20 +02:00
68ab9599ce small fix and updates to readme 2019-06-19 09:38:38 +02:00
f7e2ac01ea update barrier 2019-06-18 22:43:35 +02:00
4d8c4337ae test barrier in distrib training 2019-06-18 22:41:28 +02:00
3359955622 updating run_classif 2019-06-18 22:23:10 +02:00
29b7b30eaa updating evaluation on a single gpu 2019-06-18 22:20:21 +02:00
7d2001aa44 overwrite_output_dir 2019-06-18 22:13:30 +02:00
16a1f338c4 fixing 2019-06-18 17:06:31 +02:00
92e0ad5aba no numpy 2019-06-18 17:00:52 +02:00
4e6edc3274 hop 2019-06-18 16:57:15 +02:00
f55b60b9ee fixing again 2019-06-18 16:56:52 +02:00
8bd9118294 quick fix 2019-06-18 16:54:41 +02:00
3e847449ad fix out_label_ids 2019-06-18 16:53:31 +02:00
aad3a54e9c fix paths 2019-06-18 16:48:04 +02:00
40dbda6871 updating classification example 2019-06-18 16:45:52 +02:00
7388c83b60 update run_classifier for distributed eval 2019-06-18 16:32:49 +02:00
9727723243 fix pickle 2019-06-18 16:02:42 +02:00
9710b68dbc fix pickles 2019-06-18 16:01:15 +02:00
15ebd67d4e cache in run_classifier + various fixes to the examples 2019-06-18 15:58:22 +02:00
e6e5f19257 fix 2019-06-18 14:45:14 +02:00
a432b3d466 distributed traing t_total 2019-06-18 14:39:09 +02:00
c5407f343f split squad example in two 2019-06-18 14:29:03 +02:00
335f57baf8 only on main process 2019-06-18 14:03:46 +02:00
326944d627 add tensorboard to run_squad 2019-06-18 14:02:42 +02:00
d82e5deeb1 set find_unused_parameters=True in DDP 2019-06-18 12:13:14 +02:00
a59abedfb5 DDP update 2019-06-18 12:06:26 +02:00
2ef5e0de87 switch to pytorch DistributedDataParallel 2019-06-18 12:03:13 +02:00
9ce37af99b oups 2019-06-18 11:47:54 +02:00
a40955f071 no need to duplicate models anymore 2019-06-18 11:46:14 +02:00
3763f8944d Merge pull request #696 from huggingface/split_config_weights
Split config weights
2019-06-18 11:42:57 +02:00
f964753090 explanation on the current location of the caching folder 2019-06-18 11:36:28 +02:00
868de8d1d7 updating weights loading 2019-06-18 10:58:20 +02:00
64e0adda81 better error message 2019-06-18 10:51:31 +02:00
382e2d1e50 spliting config and weight files for bert also 2019-06-18 10:37:16 +02:00
a6f2511811 Merge pull request #694 from huggingface/release_0.6.3
Release 0.6.3
2019-06-17 16:27:25 +02:00
4447f270b2 updating hub 2019-06-17 16:21:28 +02:00
33d3db5c43 updating head masking, readme and docstrings 2019-06-17 15:51:28 +02:00
965f172de6 output all hidden layers states in GPT/GPT-2 2019-06-17 14:34:12 +02:00
f12007e421 add head masking and pruning to openai GPT 2019-06-17 14:19:40 +02:00
b860e47cf5 add head masking and pruning to gpt-2 2019-06-17 14:12:10 +02:00
7220d47a1c adding head pruning and tests 2019-06-17 13:20:45 +02:00
8415a38b23 better error messages 2019-06-17 13:03:48 +02:00
96c4d3d988 add head masking tests 2019-06-17 12:17:26 +02:00
34858ae1d9 adding bert whole words, bertgerman and gpt-2 medium models, head masking 2019-06-17 11:02:39 +02:00
80684f6f86 Merge pull request #690 from shashwath94/projadpsftmax_fix
Transformer XL ProjectedAdaptiveLogSoftmax output fix
2019-06-15 23:14:10 +02:00
9e363703d6 Merge pull request #688 from deepset-ai/german_bert
Add German Bert model to code, update readme
2019-06-15 23:13:41 +02:00
cc6cd430f7 Merge pull request #691 from vanche/master
import class "GPT2MultipleChoiceHead"
2019-06-15 23:12:55 +02:00
8289646d4e import class "GPT2MultipleChoiceHead" 2019-06-15 22:19:30 +09:00
5076a5daa7 Fix proj adp softmax output return when n_clusters=0 2019-06-14 22:03:21 -04:00
16af9ff7b0 Add German Bert model to code, update readme 2019-06-14 17:42:46 +02:00
b3f9e9451b Merge pull request #687 from huggingface/tests_and_doc
Updating tests and doc
2019-06-14 17:23:45 +02:00
44e9ddd7fe fix num_special_tokens in GPT 2 test 2019-06-14 17:17:43 +02:00
cad88e19de Merge pull request #672 from oliverguhr/master
Add vocabulary and model config to the finetune output
2019-06-14 17:02:47 +02:00
c6de625229 Merge pull request #655 from huggingface/finish_torchhub_interfaces
Finish torchhub interfaces
2019-06-14 17:02:08 +02:00
ff276fc00c Merge branch 'master' into finish_torchhub_interfaces 2019-06-14 16:59:07 +02:00
a64736dc23 Merge pull request #646 from Colanim/patch-1
Fix link in README
2019-06-14 16:57:45 +02:00
460d9afd45 Merge pull request #640 from Barqawiz/master
Support latest multi language bert fine tune
2019-06-14 16:57:02 +02:00
277c77f1c5 Merge pull request #630 from tguens/master
Update run_squad.py
2019-06-14 16:56:26 +02:00
659af2cbd0 Merge pull request #604 from samuelbroscheit/master
Fixing issue "Training beyond specified 't_total' steps with schedule 'warmup_linear'" reported in #556
2019-06-14 16:49:24 +02:00
2d6a53490d Merge pull request #597 from huggingface/attention
GPT-2 (medium size model, special_tokens, fine-tuning, attention) + repo code coverage metric
2019-06-14 16:47:32 +02:00
35e6baab37 Merge branch 'master' into attention 2019-06-14 16:41:56 +02:00
5e1207b8ad add attention to all bert models and add test 2019-06-14 16:28:25 +02:00
bcc9e93e6f fix test 2019-06-14 15:38:20 +02:00
f9cde97b31 Merge pull request #675 from meetshah1995/patch-1
[hotfix] Fix frozen pooler parameters in SWAG example.
2019-06-12 10:01:21 +02:00
e02ce4dc79 [hotfix] Fix frozen pooler parameters in SWAG example. 2019-06-11 15:13:53 -07:00
5c08c8c273 adds the tokenizer + model config to the output 2019-06-11 13:46:33 +02:00
784c0ed89a Merge pull request #668 from jeonsworld/patch-2
apply Whole Word Masking technique
2019-06-11 11:29:10 +02:00
a3a604cefb Update pregenerate_training_data.py
apply Whole Word Masking technique.
referred to [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py)
2019-06-10 12:17:23 +09:00
ee0308f79d fix typo 2019-06-06 17:30:49 +02:00
2d07f945ad fix error with torch.no_grad and loss computation 2019-06-06 17:10:24 +02:00
6b8d227092 some cleaning 2019-06-06 17:07:03 +02:00
122d5c52ac distinguish was is not trained 2019-06-06 17:02:51 +02:00
2647ac3294 forgot bertForPreTraining 2019-06-06 16:57:40 +02:00
cf44d98392 Add more examples to BERT models for torchhub 2019-06-06 16:36:02 +02:00
a3274ac40b adding attention outputs in bert 2019-06-03 16:11:45 -05:00
826496580b Revert "add output_attentions for BertModel"
This reverts commit de5e5682a12463465a9eda4d2b13efad9c50d0dd.
2019-06-03 17:10:25 -04:00
de5e5682a1 add output_attentions for BertModel 2019-06-03 17:05:24 -04:00
312fdd7752 fix doc error 2019-06-01 17:43:26 -04:00
cdf0f2fec3 fix typo/presentation 2019-06-01 17:42:00 -04:00
8f97f6c57f fix typo
cc @thomwolf
2019-06-01 17:29:07 -04:00
466a96543a fix bug/typos 2019-06-01 17:28:56 -04:00
c198ff5f1f fix typos/bugs 2019-06-01 16:28:42 -04:00
592d1e3aae fix typos 2019-06-01 16:19:32 -04:00
f836130bff update hubconf 2019-06-01 16:08:29 -04:00
c0c7ff5751 add transformer xl compatibility for torchhub 2019-06-01 16:08:24 -04:00
48a58646e8 small fix in doc 2019-06-01 16:06:50 -04:00
2576a5c6db update hubconf for gpt2 torchhub compatibility 2019-06-01 15:28:01 -04:00
a92b6dc3c1 add GPT2 torchhub compatibility 2019-06-01 15:27:43 -04:00
2a329c6186 Merge pull request #651 from huggingface/gpt_torchhub
Add GPT* compatibility to torchhub
2019-05-31 14:44:52 +02:00
45d21502f0 update doc 2019-05-31 01:04:16 -04:00
98f5c7864f decorelate dependencies + fix bug 2019-05-31 01:00:29 -04:00
c8bd026ef6 move dependecies list to hubconf 2019-05-31 00:36:58 -04:00
19ef2b0a66 Fix typo in hubconf 2019-05-31 00:33:33 -04:00
d0f591051c gpt_hubconf 2019-05-31 00:28:10 -04:00
4a210c9fc6 Move bert_hubconf to hubconfs 2019-05-31 00:28:00 -04:00
0c5a4fe9c9 modify from_pretrained for OpenAIGPT 2019-05-31 00:27:18 -04:00
372a5c1cee Hubconf doc - Specia case loading 2019-05-30 16:06:21 -04:00
96592b544b default in __init__s for classification BERT models (#650) 2019-05-30 15:53:13 -04:00
4cda86b08f Update hubconf for torchhub: paths+examples+doc 2019-05-30 18:38:00 +00:00
1eba8b9d96 Fix link in README 2019-05-30 14:01:46 +09:00
314bc6bb4e added transposes to attention.self.[query,key,value] 2019-05-27 09:47:59 -04:00
c4fe56dcc0 support latest multi language bert fine tune
fix issue of bert-base-multilingual and add support for uncased multilingual
2019-05-27 11:27:41 +02:00
8de1faea6f update to hf->tf args 2019-05-22 20:38:16 -04:00
d0adab2c39 fn change; pytorch_model_dir required=False 2019-05-22 20:24:04 -04:00
a309459b92 fn change; pytorch_model_dir required=False 2019-05-22 20:17:27 -04:00
9e7bc51b95 Update run_squad.py
Indentation change so that the output "nbest_predictions.json" is not empty.
2019-05-22 17:27:59 +08:00
69749f3fc3 update to hf->tf args 2019-05-18 17:16:01 -04:00
f1433db4f1 update to hf->tf args 2019-05-18 17:09:08 -04:00
077a5b0dc4 Merge remote-tracking branch 'upstream/master' into convert-back-to-tf
merging
2019-05-18 16:06:08 -04:00
2bcda8d00c update 2019-05-18 15:55:11 -04:00
94247ad6cb Make num_train_optimization_steps int 2019-05-13 12:38:22 +02:00
49a77ac16f Clean up a little bit 2019-05-12 00:31:10 +02:00
3bf3f9596f Fixing the issues reported in https://github.com/huggingface/pytorch-pretrained-BERT/issues/556
Reason for issue was that optimzation steps where computed from example size, which is different from actual size of dataloader when an example is chunked into multiple instances.

Solution in this pull request is to compute num_optimization_steps directly from len(data_loader).
2019-05-12 00:13:45 +02:00
3fc63f126d Merge pull request #598 from burcturkoglu/master
Updating learning rate with special warm up in examples
2019-05-10 13:48:12 +02:00
00c7fd2b79 Division to num_train_optimizer of global_step in lr_this_step is removed. 2019-05-09 10:57:03 +03:00
fa37b4da77 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-05-09 10:55:24 +03:00
5289b4b9e0 Division to num_train_optimizer of global_step in lr_this_step is removed. 2019-05-09 10:51:38 +03:00
275179a003 output attentions in GPT-2 2019-05-08 22:24:42 +02:00
366a3b0285 clean up in tokenization 2019-05-08 21:43:51 +02:00
701bd59b8b Merge pull request #585 from huntzhan/master
Make the epsilon of LayerNorm configurable.
2019-05-08 16:56:38 +02:00
303b5e2b92 Merge pull request #545 from ailzhang/cache_dir
move pytroch_pretrained_bert cache folder under same path as torch
2019-05-08 16:55:27 +02:00
0198399d84 Merge pull request #570 from MottoX/fix-1
Create optimizer only when args.do_train is True
2019-05-08 16:07:50 +02:00
50fa92c026 Merge pull request #571 from MottoX/patch-1
Fix documentation typo
2019-05-08 16:06:13 +02:00
0efc4ab632 adding dropout to GPT-2 and embedding dropout to GPT 2019-05-08 10:41:35 +02:00
ea9dbea9d5 update GPT2 loss computation for more flexbility 2019-05-07 23:27:18 +02:00
ce86336545 add predict_special_tokens option to GPT also 2019-05-07 16:47:22 +02:00
d1b6979aa5 GPT-2 option to avoid predicting special tokens 2019-05-07 16:25:53 +02:00
101ab4dd8e Make the epsilon of LayerNorm configurable. 2019-05-06 00:26:21 +08:00
41089bc7d3 added file to convert pytorch->tf 2019-05-02 13:26:22 -04:00
0a8b4d65be added file to convert pytorch->tf 2019-05-02 13:20:59 -04:00
968c1b44cb added file to convert pytorch->tf 2019-05-02 13:19:56 -04:00
96c2b77f0f added file to convert pytorch->tf 2019-05-02 13:14:25 -04:00
e211785ada extract attention weights from GPT 2019-05-02 18:31:26 +02:00
18c8aef9d3 Fix documentation typo 2019-05-02 19:23:36 +08:00
74dbba64bc Prepare optimizer only when args.do_train is True 2019-05-02 19:09:29 +08:00
db98a4a48b gpt-2 tokenizer 2019-05-01 11:40:48 +02:00
3ae8c8be1e Merge pull request #562 from apappu97/roc_stories_lmlabels_fix
Small fix to remove shifting of lm labels during pre process of RocStories.
2019-05-01 11:20:17 +02:00
e89520175d Merge pull request #564 from 8enmann/patch-2
Fix #537
2019-05-01 11:18:46 +02:00
74f7906db4 Fix #537 2019-04-30 19:48:22 -07:00
365fb34c6c small fix to remove shifting of lm labels during pre process of roc stories, as this shifting happens interanlly in the model 2019-04-30 13:53:04 -07:00
cd110835a0 coverage in circle-ci 2019-04-30 11:35:40 +02:00
2dee86319d Merge pull request #527 from Mathieu-Prouveur/fix_value_training_loss
Update example files so that tr_loss is not affected by args.gradient…
2019-04-30 11:12:55 +02:00
80f53f7380 gpt-2 from_pretrained can use special tokens 2019-04-30 11:10:22 +02:00
e79ceb1533 gpt-2 special tokens 2019-04-30 11:05:54 +02:00
1f5fc95b68 add code coverage 2019-04-30 11:05:26 +02:00
c30139a013 add special tokens to gpt-2 2019-04-30 10:45:26 +02:00
87b9ec3843 Fix tr_loss rescaling factor using global_step 2019-04-29 12:58:29 +02:00
3963d57c89 move pytroch_pretrained_bert cache folder under same path as torch 2019-04-27 11:09:11 -07:00
b832d5bb8a Release: 0.6.2 2019-04-25 21:37:47 +02:00
e6cf62d499 Merge pull request #488 from dhpollack/fix_multichoice
fixed BertForMultipleChoice model init and forward pass
2019-04-25 21:04:16 +02:00
1cc1c3c344 Merge pull request #533 from lukovnikov/master
Docs for new learning rate code
2019-04-25 21:02:35 +02:00
dee8af4e46 Merge pull request #518 from huggingface/schedules_in_examples
Fix training schedules in examples to match new API
2019-04-25 21:01:04 +02:00
56a47ce2b7 - replaced OpenAIGPTAdam with OpenAIAdam in docs 2019-04-25 16:05:28 +02:00
331a46ff04 - replaced OpenAIGPTAdam with OpenAIAdam in docs 2019-04-25 16:04:37 +02:00
704037ad51 - updated docs for new LR API
- added some images for illustration
- updated comments in optimization
2019-04-25 15:59:39 +02:00
d76a57b0ba Merge pull request #506 from ailzhang/hubconf
Hubconf
2019-04-24 20:59:21 +02:00
80f995a141 revert BertForMultipleChoice linear classifier 2019-04-24 16:51:54 +02:00
ed8fad7390 Update example files so that tr_loss is not affected by args.gradient_accumulation_step 2019-04-24 14:07:00 +02:00
d94c6b0144 fix training schedules in examples to match new API 2019-04-23 11:17:06 +02:00
c36cca075a Merge pull request #515 from Rocketknight1/master
Fix --reduce_memory in finetune_on_pregenerated
2019-04-23 10:30:23 +02:00
99e02c3415 Merge pull request #512 from cynthia/master
Fix indentation weirdness in GPT-2 example.
2019-04-23 10:29:01 +02:00
98cb7b2c51 Merge pull request #445 from lukovnikov/master
Learning rate schedules improvement + extension
2019-04-23 10:27:38 +02:00
b8e2a9c584 Made --reduce_memory actually do something in finetune_on_pregenerated 2019-04-22 14:01:48 +01:00
af8a0384fc Merge pull request #1 from huggingface/master
Pulling commits from main repo
2019-04-22 13:56:47 +01:00
14b1f719f4 Fix indentation weirdness in GPT-2 example. 2019-04-22 02:20:22 +09:00
69850b4011 python 2 compat 2019-04-21 14:02:38 +02:00
bb7557d3ab - removed __all__ in optimization
- removed unused plotting code
- using ABC for LRSchedule
- added some schedule object init tests
2019-04-21 13:48:33 +02:00
34ccc8ebf4 Merge remote-tracking branch 'upstream/master' 2019-04-21 13:16:15 +02:00
bfd6f6b257 fix from_pretrained positional args 2019-04-17 16:31:40 -07:00
ae4c9fee73 add hubconf 2019-04-17 13:34:34 -07:00
68a889ee43 Merge pull request #500 from huggingface/network
Updating network handling
2019-04-17 15:22:14 +02:00
34ae5bf838 small clean up in tests 2019-04-17 14:52:12 +02:00
23d4554ec0 is python 2 happy now 2019-04-17 14:48:34 +02:00
265550ec34 relax network connection requirements 2019-04-17 14:22:35 +02:00
fa76520240 fix file_utils on python 2 2019-04-17 13:32:22 +02:00
bcde2c61cb fix #497 2019-04-17 12:35:38 +02:00
929579f3b5 fix #497 2019-04-17 12:35:08 +02:00
31d387604c adding s3 model tests with --runslow 2019-04-17 11:58:27 +02:00
8407429d74 Merge pull request #494 from SudoSharma/patch-1
Fix indentation for unconditional generation
2019-04-17 11:11:36 +02:00
2e153930cf Merge pull request #495 from SudoSharma/patch-2
Fix gradient overflow issue during attention mask
2019-04-17 11:10:36 +02:00
46078e1b46 Merge pull request #496 from 8enmann/patch-1
[run_gpt2.py] temperature should be a float, not int
2019-04-17 11:08:54 +02:00
b8686130ca Merge pull request #498 from huggingface/GPT2_tokenization
Gpt2 tokenization
2019-04-17 11:06:41 +02:00
5afa497cbf fix GPT-2 tokenization to work also on python 3... 2019-04-17 11:04:41 +02:00
bc70779bf0 fixed GPT-2 tokenization on python 2 2019-04-17 10:56:15 +02:00
87677fcc4d [run_gpt2.py] temperature should be a float, not int 2019-04-16 15:23:21 -07:00
9e666aaa29 Fix gradient overflow issue during attention mask
This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!
2019-04-16 11:42:34 -07:00
07154dadb4 Fix indentation for unconditional generation 2019-04-16 11:11:49 -07:00
bdaba1897c updating GPT tokenization 2019-04-16 17:44:06 +02:00
18a8a15f78 improving GPT2 tokenization and adding tests 2019-04-16 17:00:55 +02:00
3d78e226e6 Merge pull request #489 from huggingface/tokenization_serialization
Better serialization for Tokenizers and Configuration classes - Also fix #466
2019-04-16 08:49:54 +02:00
3571187ef6 fix saving models in distributed setting examples 2019-04-15 16:43:56 +02:00
64b6ef4db0 Merge pull request #490 from huggingface/better_finetuning_GPT_GPT-2
Clean up GPT and GPT-2 losses computation
2019-04-15 16:14:50 +02:00
d616022455 fix openai special tokens loading 2019-04-15 16:07:45 +02:00
df5d9c3551 load all models on cpu 2019-04-15 15:43:01 +02:00
2499b0a5fc add ptvsd to run_squad 2019-04-15 15:33:04 +02:00
7816f7921f clean up distributed training logging in run_squad example 2019-04-15 15:27:10 +02:00
1135f2384a clean up logger in examples for distributed case 2019-04-15 15:22:40 +02:00
cc43307023 update readme 2019-04-15 15:06:10 +02:00
60ea6c59d2 added best practices for serialization in README and examples 2019-04-15 15:00:33 +02:00
179a2c2ff6 update example to work with new serialization semantic 2019-04-15 14:33:23 +02:00
b3c6ee0ac1 tokenization updates 2019-04-15 14:24:52 +02:00
20577d8a7c add configuration serialization to readme 2019-04-15 14:21:41 +02:00
9761aa4845 add to_json_file method to configuration classes 2019-04-15 14:12:08 +02:00
b17963d82f update readme 2019-04-15 13:44:30 +02:00
e8568a3b17 fixing tests 2019-04-15 12:55:38 +02:00
870b734bfd added tokenizers serialization tests 2019-04-15 12:03:56 +02:00
3e65f255dc add serialization semantics to tokenizers - fix transfo-xl tokenizer 2019-04-15 11:47:25 +02:00
6b35cfd28f Merge pull request #423 from dhanajitb/master
making unconditional generation work
2019-04-15 11:01:53 +02:00
aff44f0c08 Merge branch 'master' into master 2019-04-15 10:58:34 +02:00
7e7e4753c8 Merge pull request #480 from mboyanov/docs/cls_token_info
Extend the BertForSequenceClassification docs to mention the special CLS token.
2019-04-15 10:57:25 +02:00
bb61b747df Merge pull request #474 from jiesutd/master
Fix tsv read error in Windows
2019-04-15 10:56:48 +02:00
7873d76464 Merge pull request #478 from Rocketknight1/master
Added a helpful error for users with single-document corpuses - fixes # 452
2019-04-15 10:55:57 +02:00
38ba7b439b fixed BertForMultipleChoice model init and forward pass 2019-04-15 10:38:01 +02:00
fe2756ff41 update double head model 2019-04-15 10:04:05 +02:00
34cf67fd6c Extend the BertForSequenceClassification docs to mention the special CLS token. 2019-04-12 21:30:28 +03:00
dbbd6c7500 Replaced some randints with cleaner randranges, and added a helpful
error for users whose corpus is just one giant document.
2019-04-12 15:07:58 +01:00
b509bf7655 updating loss computation 2019-04-12 12:12:33 +02:00
1d203a34c0 back to simple indexing 2019-04-11 23:51:03 +02:00
616743330e Merge pull request #462 from 8enmann/master
fix run_gpt2.py
2019-04-11 21:54:46 +02:00
2cdfb8b254 Merge pull request #467 from yaroslavvb/patch-2
Update README.md
2019-04-11 21:53:23 +02:00
c49ce3c722 fix tsv read error in Windows 2019-04-11 15:40:19 -04:00
074c869bbe fix OpenAIGPTMultipleChoiceHead 2019-04-11 20:53:50 +02:00
724eb45cef add stale bot 2019-04-11 17:12:00 +02:00
4bc4c69af9 finetuning any BERT model - fixes #455 2019-04-11 16:57:59 +02:00
a05fad8dce fix typo 2019-04-11 13:16:17 +02:00
4a82f4f856 update special token addition 2019-04-11 13:11:22 +02:00
991b8e65f4 Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT 2019-04-11 11:43:15 +02:00
e99b2014cc fixes #471 2019-04-11 11:43:13 +02:00
8fffba5f47 Update README.md
Fix for

```> > > > 04/09/2019 21:39:38 - INFO - __main__ -   device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
Traceback (most recent call last):
  File "/home/ubuntu/pytorch-pretrained-BERT/examples/lm_finetuning/simple_lm_finetuning.py", line 642, in <module>
    main()
  File "/home/ubuntu/pytorch-pretrained-BERT/examples/lm_finetuning/simple_lm_finetuning.py", line 502, in main
    raise ValueError("Training is currently the only implemented execution option. Please set `do_train`.")
ValueError: Training is currently the only implemented execution option. Please set `do_train`.
```
2019-04-09 14:45:47 -07:00
fd8a3556f0 fix run_gpt2.py 2019-04-08 17:20:35 -07:00
f4fc9c6152 Merge branch 'master' of https://github.com/dhanajitb/pytorch-pretrained-BERT 2019-04-07 17:52:35 +05:30
6c4c7be282 Merge remote-tracking branch 'upstream/master' 2019-04-07 16:59:36 +05:30
4d3cf0d602 removing some redundant lines 2019-04-07 16:59:07 +05:30
0d6a882f63 Cleaned some redundant lines
```while not args.unconditional:
   if not args.unconditional:
```
These lines have been updated
2019-04-07 16:54:38 +05:30
fc7693adc3 schedule fix 2019-04-03 18:16:47 +02:00
20686b78fc schedule fix 2019-04-03 18:13:52 +02:00
1b4ce76c38 schedule fix 2019-04-03 17:40:12 +02:00
5fed5bb3d6 schedule fix 2019-04-03 17:20:29 +02:00
23bd2eebf5 schedule fix 2019-04-03 17:10:34 +02:00
91a073f804 schedule fix 2019-04-03 17:10:08 +02:00
b64cc63a77 optimization schedule test update 2019-04-03 16:42:40 +02:00
d164867d90 - updated docs for optimization 2019-04-03 16:13:51 +02:00
1758c8fc72 - updated docs for optimization 2019-04-03 16:08:34 +02:00
725a56329d Merge remote-tracking branch 'upstream/master' into optim
# Conflicts:
#	pytorch_pretrained_bert/optimization.py

- updated docs for optimization
2019-04-03 16:07:50 +02:00
94980b529f Merge pull request #404 from CatalinVoss/fix_lm_loss
Fix Language Modeling Loss
2019-04-03 11:35:30 +02:00
9ca25ce828 Merge pull request #427 from jeonsworld/patch-1
fix sample_doc
2019-04-03 11:26:58 +02:00
db4dccd1b5 Merge pull request #389 from lukovnikov/master
Fix cosine schedule
2019-04-03 11:21:43 +02:00
19666dcb3b Should fix #438 2019-04-03 11:01:01 +02:00
1d8c232324 Fix #436 2019-04-03 10:51:03 +02:00
846b1fd6f8 Fix #419 2019-04-03 10:50:38 +02:00
404adcdabf Merge pull request #437 from MottoX/fix-link
Fix links in README
2019-04-02 11:40:46 +02:00
f26ce6992e Fix links in README 2019-04-02 17:20:32 +08:00
2f80dbbc0d Merge pull request #430 from MottoX/master
Fix typo in example code
2019-04-02 10:41:56 +02:00
94adad6be3 Merge pull request #435 from marpaia/training-fixes
Fixes to the TensorFlow conversion tool
2019-04-02 10:41:40 +02:00
8b5c63e4de Fixes to the TensorFlow conversion tool 2019-04-01 13:17:54 -06:00
d07db28f52 Fix typo in example code
Modify 'unambigiously' to 'unambiguously'
2019-03-31 01:20:18 +08:00
60005f464d Update pregenerate_training_data.py
If the value of rand_end is returned from the randint function, the value of sampled_doc_index that matches current_idx is returned from searchsorted.

example:
cumsum_max = {int64} 30
doc_cumsum = {ndarray} [ 5  7 11 19 30]
doc_lengths = {list} <class 'list'>: [5, 2, 4, 8, 11]
if current_idx  = 1,
rand_start = 7
rand_end = 35
sentence_index = randint(7, 35) % cumsum_max
if randint return 35, sentence_index becomes 5.
if sentence_index is 5, np.searchsorted returns 1 equal to current_index.
2019-03-30 14:50:17 +09:00
4d3721f9bc Just updating
Merge remote-tracking branch 'upstream/master'
2019-03-29 21:56:47 +05:30
ec5c1d6134 Merge pull request #425 from Separius/patch-1
fix lm_finetuning's link
2019-03-29 09:14:11 +01:00
b588ff362a fix lm_finetuning's link 2019-03-29 12:39:24 +04:30
f872eb98c2 making unconditional generation work
The unconditional generation works now but if the seed is fixed, the sample is the same every time.
n_samples > 1 will give different samples though.
I am giving the start token as '<|endoftext|>' for the unconditional generation.
2019-03-28 22:46:15 +05:30
694e2117f3 Merge pull request #388 from ananyahjha93/master
Added remaining GLUE tasks to 'run_classifier.py'
2019-03-28 09:06:53 +01:00
01520d5412 Remove my unhelpful comments :) 2019-03-27 10:45:28 -07:00
f7c9dc8c99 Merge pull request #409 from ikuyamada/master
Remove padding_idx from position_embeddings and token_type_embeddings
2019-03-27 12:30:03 +01:00
cc8c2d2332 Merge pull request #396 from IndexFziQ/IndexFziQ
add tqdm to the process of eval in examples/run_swag.py
2019-03-27 12:03:26 +01:00
bbff03fbfc Merge pull request #394 from desireevl/master
Minor change in README
2019-03-27 12:03:00 +01:00
2fb8ddeeff Merge pull request #392 from Rocketknight1/master
Add full language model fine-tuning
2019-03-27 12:02:36 +01:00
34561e61a5 update main readme also 2019-03-27 12:00:04 +01:00
361aff6de5 typos 2019-03-27 11:54:59 +01:00
cea8ba1d59 adjusted formating and some wording in the readme 2019-03-27 11:53:44 +01:00
0401317b23 Remove padding_idx from position_embeddings and token_type_embeddings 2019-03-26 21:56:35 +09:00
24e67fbf75 Minor README update 2019-03-25 12:33:30 +00:00
8d1d1ffde2 Corrected the displayed loss when gradient_accumulation_steps > 1 2019-03-25 12:15:19 +00:00
fda2f62395 Fix test failures due to old torch issue with non-contiguous view 2019-03-24 14:37:13 -07:00
0dd796e359 Also fix loss function issue with the double head models 2019-03-24 14:35:55 -07:00
472857c47f Fix typo syntax err (sorry, c/p from my repo) 2019-03-24 14:14:49 -07:00
2e6f5ffb96 Fix GPT language model loss here as well 2019-03-24 14:14:44 -07:00
5938f31fa7 Fix c/p typo from my experiment code 2019-03-24 14:14:40 -07:00
7797d21b8d Fix GPT2 language modeling loss computation 2019-03-24 14:14:35 -07:00
f471979167 added GLUE dev set results and details on how to run GLUE tasks 2019-03-21 15:38:30 -04:00
abb7d1ff6d Added proper context management to ensure cleanup happens in the right
order.
2019-03-21 17:50:03 +00:00
06a30cfdf3 Added a --reduce_memory option to the training script to keep training
data on disc as a memmap rather than in memory
2019-03-21 17:04:12 +00:00
7d1ae644ef Added a --reduce_memory option to the training script to keep training
data on disc as a memmap rather than in memory
2019-03-21 17:02:18 +00:00
2bba7f810e Added a --reduce_memory option to shelve docs to disc instead of keeping them in memory. 2019-03-21 16:50:16 +00:00
8733ffcb5e Removing a couple of other old unnecessary comments 2019-03-21 14:09:57 +00:00
8a861048dd Fixed up the notes on a possible future low-memory path 2019-03-21 14:08:39 +00:00
a8a577ba93 Reduced memory usage for pregenerating the data a lot by writing it
out on the fly without shuffling - the Sampler in the finetuning script
will shuffle for us.
2019-03-21 14:05:52 +00:00
0ae59e662d Reduced memory usage for pregenerating the data a lot by writing it
out on the fly without shuffling - the Sampler in the finetuning script
will shuffle for us.
2019-03-21 14:04:17 +00:00
6a9038ba53 Removed an old irrelevant comment 2019-03-21 13:36:41 +00:00
77944d1b31 add tqdm to the process of eval
Maybe better.
2019-03-21 20:59:33 +08:00
d52f914e24 weigths to weights 2019-03-21 15:02:59 +10:00
29a392fbcf Small README changes 2019-03-20 17:35:17 +00:00
832b2b0058 Adding README 2019-03-20 17:31:49 +00:00
934d3f4d2f Syncing up argument names between the scripts 2019-03-20 17:23:23 +00:00
f19ba35b2b Move old finetuning script into the new folder 2019-03-20 16:47:06 +00:00
7de5c6aa5e PEP8 and formatting cleanups 2019-03-20 16:44:04 +00:00
1798e98e5a Added final TODOs 2019-03-20 16:42:37 +00:00
c64c2fc4c2 Fixed embarrassing indentation problem 2019-03-20 15:42:57 +00:00
0540d360f2 Fixed logging 2019-03-20 15:36:51 +00:00
976554a472 First commit of the new LM finetuning 2019-03-20 14:23:51 +00:00
262a9992d7 class weights 2019-03-18 18:29:12 +01:00
19cc2c084e same 2019-03-18 15:13:35 +01:00
2283dcca5e import revert 2019-03-18 13:40:12 +01:00
b6c1cae67b branches, optim cosine fix 2019-03-18 13:32:04 +01:00
ef28b2c747 branches, optim cosine fix 2019-03-18 13:18:07 +01:00
90430ae7ec Merge remote-tracking branch 'origin/master'
# Conflicts:
#	pytorch_pretrained_bert/optimization.py
2019-03-18 13:15:29 +01:00
bed6408dcc branches, optim cosine fix 2019-03-18 13:09:55 +01:00
e5b63fb542 Merge branch 'master' of https://github.com/ananyahjha93/pytorch-pretrained-BERT
pull current master to local
2019-03-17 08:30:13 -04:00
8a4e90ff40 corrected folder creation error for MNLI-MM, verified GLUE results 2019-03-17 08:16:50 -04:00
e0bf01d9a9 added hack for mismatched MNLI 2019-03-16 14:10:48 -04:00
4c721c6b6a added eval time metrics for GLUE tasks 2019-03-15 23:21:24 -04:00
f3e5404880 Merge pull request #381 from tseretelitornike/master
Added missing imports.
2019-03-15 12:54:40 +01:00
83857ffeaa Added missing imports. 2019-03-15 12:45:48 +01:00
d5c037c3ed Merge pull request #380 from yongbowin/patch-3
typo in annotation
2019-03-14 15:56:40 +01:00
d1e4fa98a9 typo in annotation
modify `heruistic` to `heuristic` in line 660, `charcter` to `character` in line 661.
2019-03-14 17:32:15 +08:00
59e2bdd086 Merge pull request #379 from yongbowin/patch-2
typo
2019-03-14 10:17:18 +01:00
3d6452163d typo
modify `mull` to `null` in line 474 annotation.
2019-03-14 17:03:38 +08:00
76906372b0 Merge pull request #378 from huggingface/absolute_imports
Add absolute imports to GPT, GPT-2, Transfo-XL and and fix empty nbest_predictions.json
2019-03-14 10:00:47 +01:00
a98dfe4ced fixing #377 (empty nbest_predictions.json) 2019-03-14 09:57:06 +01:00
e5f2d9122c adding absolute imports to gpt2, openai and transfo-xl 2019-03-14 09:55:01 +01:00
043c8781ef added code for all glue task processors 2019-03-14 04:24:04 -04:00
eecaaa734a Merge pull request #371 from yongbowin/patch-1
Simplify code, delete redundancy line
2019-03-14 09:03:32 +01:00
20e652209c relation classification: replacing entity mention with mask token 2019-03-13 16:13:37 +01:00
22a465a91f Simplify code, delete redundancy line
delete redundancy line `if args.train`, simplify code.
2019-03-13 09:42:06 +08:00
eac039d21f changing docker 2019-03-12 13:45:12 +01:00
471daf1b6c changing docker 2019-03-12 13:32:42 +01:00
9024613337 changing docker 2019-03-12 13:23:58 +01:00
baf66d1419 restart cosine lr schedule 2019-03-12 13:22:23 +01:00
9b03d67b83 Merge pull request #362 from Bharat123rox/patch-1
Make the hyperlink of NVIDIA Apex clickable
2019-03-11 09:08:51 +01:00
8435d78f0c Merge pull request #361 from junjieqian/jqian/updateReadme
Correct line number in README for classes
2019-03-11 09:08:27 +01:00
80790705e0 Merge pull request #359 from elonmuskceo/fix-typo
Update run_gpt2.py
2019-03-11 09:07:56 +01:00
13aa13dbc0 Merge pull request #358 from cdjhz/patch-1
add 'padding_idx=0' for BertEmbeddings
2019-03-11 09:06:55 +01:00
c0660df5dd Merge pull request #357 from pglock/feature/354-use-dropout-layer-gpt
Use Dropout Layer in OpenAIGPTMultipleChoiceHead
2019-03-11 09:06:27 +01:00
f91ce0b803 Make the hyperlink of NVIDIA Apex clickable 2019-03-09 20:05:39 +05:30
51efde54a9 cos fix 2019-03-09 02:45:25 +01:00
f113a2dfdc readme de 2019-03-09 02:29:57 +01:00
90a41dbe14 BertAdam schedule objects 2019-03-09 02:23:20 +01:00
d648a02203 Correct line number in README for classes 2019-03-08 16:28:03 -08:00
88874f6cf0 BertAdam schedule objects 2019-03-08 19:08:30 +01:00
66d8206809 Update run_gpt2.py 2019-03-08 11:59:08 -05:00
72fa8d03a7 add 'padding_idx=0' for BertEmbeddings 2019-03-07 20:02:55 +08:00
6190e8ce4c Fix: use dropout layer 2019-03-07 10:12:45 +01:00
7cc35c3104 fix openai gpt example and updating readme 2019-03-06 11:43:21 +01:00
906b638efa updating readme 2019-03-06 10:24:19 +01:00
994d86609b fixing PYTORCH_PRETRAINED_BERT_CACHE use in examples 2019-03-06 10:21:24 +01:00
2dd8f524f5 removing test for long sequences error following #337 2019-03-06 10:10:41 +01:00
5c85fc3977 fix typo - logger info 2019-03-06 10:05:21 +01:00
8e36da7acb Merge pull request #347 from jplehmann/feature/sst2-processor
Processor for SST-2 task
2019-03-06 09:48:27 +01:00
21c88a07b7 Merge pull request #341 from potatochip/patch-1
catch exception if pathlib not install
2019-03-06 09:48:01 +01:00
3c01dfb775 Merge pull request #338 from CatalinVoss/patch-3
Fix top k generation for k != 0
2019-03-06 09:47:33 +01:00
477ec4b6cc Merge pull request #337 from CatalinVoss/patch-2
Allow tokenization of sequences > 512 for caching
2019-03-06 09:45:49 +01:00
7b9e5a54b5 Merge pull request #327 from lukovnikov/master
Issue#324: warmup linear fixes
2019-03-06 09:44:56 +01:00
4784b04f47 Merge pull request #325 from john-hewitt/master
add BertTokenizer flag to skip basic tokenization
2019-03-06 09:37:11 +01:00
4a49c22584 Warn instead of raising in BERT and GPT-2 tokenizers as well, to allow for pre-caching of tokens 2019-03-05 12:31:45 -08:00
e99bc87e4d Merge branch 'patch-1' into patch-2 2019-03-05 12:24:18 -08:00
0f96d4b1f7 Run classifier processor for SST-2. 2019-03-05 13:38:28 -06:00
0c970caa4a catch exception if pathlib not install 2019-03-04 14:30:19 -08:00
4b4b079272 Fix top k generation for k != 0 2019-03-02 21:54:44 -08:00
9775b2eb27 Allow tokenization of sequences > 512 for caching
For many applications requiring randomized data access, it's easier to cache the tokenized representations than the words. So why not turn this into a warning?
2019-03-02 16:30:21 -08:00
c0cf0a04d5 Fix typo 2019-02-27 18:01:06 -08:00
4d1ad83236 update docstring of BERT tokenizer to reflect do_wordpiece_only 2019-02-27 14:50:41 -08:00
35410da758 added warning 2019-02-27 17:11:42 +01:00
4d79e0d386 added warning 2019-02-27 16:50:05 +01:00
66a84b63b0 added warning 2019-02-27 16:38:00 +01:00
070f3b21d8 added warning 2019-02-27 16:26:45 +01:00
46ef646016 added warning 2019-02-27 16:22:27 +01:00
9bc3773c84 added warning 2019-02-27 16:10:31 +01:00
60a372387f added warning 2019-02-27 15:54:09 +01:00
e14c6b52e3 add BertTokenizer flag to skip basic tokenization 2019-02-26 20:11:24 -08:00
da2d8ca265 fix for negative learning rate with warmup_linear in BertAdam (happens when t_total is specified incorrectly)
+ copied BERT optimization warmup functions to OpenAI optimization file + added comments
2019-02-26 17:16:06 +01:00
e04bab59e1 fix for negative learning rate with warmup_linear in BertAdam (happens when t_total is specified incorrectly)
+ copied BERT optimization warmup functions to OpenAI optimization file + added comments
2019-02-26 16:22:52 +01:00
2152bfeae8 Merge pull request #316 from joelgrus/gpt2docs
update documentation for gpt-2
2019-02-24 09:38:29 +01:00
8722e9eb3b finish updating docstrings 2019-02-23 06:31:59 -08:00
33aa7a80ca update documentation 2019-02-22 15:37:59 -08:00
a5b3a89545 Merge pull request #310 from spolu/spolu-nits_gpt2
Few small nits in GPT-2's README code examples
2019-02-21 10:23:27 +01:00
ff22b3acc0 Few small nits in GPT-2's code examples 2019-02-21 09:15:27 +00:00
cbb7fad319 Merge pull request #307 from guotong1988/patch-1
Update README.md
2019-02-21 09:25:19 +01:00
09efcece75 Update README.md 2019-02-21 11:25:33 +08:00
97c815dae2 Merge pull request #305 from bkj/patch-1
Update run_openai_gpt.py
2019-02-20 21:24:06 +01:00
8607233679 Update run_openai_gpt.py 2019-02-20 13:58:54 -05:00
f50b82af04 Merge pull request #302 from yongbowin/master
typo
2019-02-20 14:14:02 +01:00
2fdab323d1 typo 2019-02-20 21:11:06 +08:00
813e4d18ba typo 2019-02-20 21:10:07 +08:00
8337740754 Merge pull request #295 from tnlin/master
fix broken link in readme
2019-02-19 14:00:28 +01:00
5b0e0b61f0 fix typo in readme 2019-02-19 20:34:18 +08:00
3ca35b99ba Merge pull request #293 from davidefiocco/patch-2
Minor README typos corrected
2019-02-19 09:00:01 +01:00
0ae8eece55 MInor README typos corrected 2019-02-18 21:28:28 +01:00
07ebe0fd06 Merge pull request #292 from sam-qordoba/patch-3
Fix typo in `GPT2Model` code sample
2019-02-18 21:07:39 +01:00
1cb9c76ec5 Fix typo in GPT2Model code sample
Typo prevented code from running
2019-02-18 09:27:26 -08:00
a25d056b7a update readme 2019-02-18 15:30:11 +01:00
517d7c8624 update readme 2019-02-18 14:39:55 +01:00
ada22a1c9e more details in GPT-2 usage example 2019-02-18 14:37:41 +01:00
522733f6cb readme typo fixes 2019-02-18 14:32:10 +01:00
0202da0271 remove unnecessary example 2019-02-18 13:51:42 +01:00
8f46cd1057 Merge pull request #288 from huggingface/gpt2
forgot to add regex to requirements.txt :(
2019-02-18 12:00:11 +01:00
e0855e8929 forgot to add regex to requirements :( 2019-02-18 11:54:51 +01:00
0856a231c0 Merge pull request #287 from huggingface/gpt2
Gpt2
2019-02-18 11:38:05 +01:00
ab7f5d2943 simple 2019-02-18 11:33:54 +01:00
b450a7faf2 clean up tokenization - fix python 2 tests 2019-02-18 11:27:18 +01:00
d44db1145c update readme 2019-02-18 11:12:09 +01:00
690a0dbf36 fix example - masking 2019-02-18 10:50:30 +01:00
fbb248a2e4 examples testing 2019-02-18 01:28:18 +01:00
5ff0c60505 language update 2019-02-18 00:55:47 +01:00
210d407245 updating init 2019-02-18 00:55:39 +01:00
b65f07d8c0 adding examples 2019-02-18 00:55:33 +01:00
009ee86a19 fix tests - bump up version 2019-02-17 23:57:23 +01:00
ffd623823d adding gpt2 2019-02-17 23:38:51 +01:00
3a2f97db6f Merge pull request #286 from hendrycks/patch-1
Update activation function docstring
2019-02-17 15:30:46 +01:00
434d15da8e Update activation function docstring 2019-02-16 12:17:52 -08:00
5faf386652 Merge pull request #282 from wlhgtc/master
Fix some bug about SQuAD code
2019-02-15 10:06:51 +01:00
8efaf8f176 fix 'best_non_null_entry' is None error 2019-02-15 15:57:25 +08:00
0e774e57a6 Update readme
Adding details on how to extract a full list of hidden states for the Transformer-XL
2019-02-14 08:39:58 +01:00
c35d9d48d9 Merge pull request #275 from davidefiocco/patch-1
--do_lower_case is duplicated in parser args
2019-02-13 16:32:21 +01:00
65df0d78ed --do_lower_case is duplicated in parser args
Deleting one repetition (please review!)
2019-02-13 15:30:05 +01:00
4e56da38d9 Merge pull request #268 from wangxiaodiu/master
fixed a minor bug in README.md
2019-02-13 10:19:25 +01:00
cdcb206e10 Merge pull request #273 from huggingface/update_to_fifth_release
Update to fifth release
2019-02-13 10:19:08 +01:00
321d70a7a9 bump up to 0.5.1 2019-02-13 10:11:20 +01:00
67376c02e2 update readme for tokenizers 2019-02-13 10:11:11 +01:00
c6bea08448 OpenAI GPT Tokenizer can fallback on using BERT BasicTokenizer 2019-02-13 10:11:00 +01:00
e7cfc46fc1 fix TransfoXLModel loading 2019-02-13 09:32:46 +01:00
e1b3cfb504 fixed a minor bug in README.md 2019-02-12 15:54:23 +04:00
3c33499f87 fix typo in readme 2019-02-12 10:22:54 +01:00
03cdb2a390 Merge pull request #254 from huggingface/python_2
Adding OpenAI GPT and Transformer-XL models, compatibility with Python 2
2019-02-11 14:19:26 +01:00
1e71f11dec Release: 0.5.0 2019-02-11 14:16:27 +01:00
d38caba169 typo in run_squad 2019-02-11 14:10:27 +01:00
af62cc5f20 fix run_squad example 2019-02-11 14:06:32 +01:00
eebc8abbe2 clarify and unify model saving logic in examples 2019-02-11 14:04:19 +01:00
81c7e3ec9f fix typo in readme 2019-02-11 13:37:12 +01:00
e8fe6b7140 adapting transfo tokenizer to transposed inputs 2019-02-11 13:30:04 +01:00
884ca81d87 transposing the inputs of Transformer-XL to have a unified interface 2019-02-11 13:19:59 +01:00
32fea876bb add distant debugging to run_transfo_xl 2019-02-11 12:53:32 +01:00
b31ba23913 cuda on in the examples by default 2019-02-11 12:15:43 +01:00
0a9860daa7 tests pass on python 2 and 3 2019-02-11 10:47:52 +01:00
2071a9b86e fix python 2.7 imports 2019-02-11 10:35:36 +01:00
8197eb9f10 update Circle CI config 2019-02-11 10:22:10 +01:00
525eba68ab update Circle CI 2019-02-11 10:19:25 +01:00
b514a60c36 added tests for OpenAI GPT and Transformer-XL tokenizers 2019-02-11 10:17:16 +01:00
9bdcba53fd fix tests 2019-02-09 17:07:12 +01:00
f0bf81e141 back compatibility with Path inputs in fle_utils 2019-02-09 17:05:23 +01:00
9f9909ea2f update readme 2019-02-09 16:59:21 +01:00
6cd769957e update transfo xl example 2019-02-09 16:59:17 +01:00
1320e4ec0c mc_token_mask => mc_token_ids 2019-02-09 16:58:53 +01:00
f4a07a392c mems not splitted 2019-02-09 16:14:31 +01:00
43b9af0cac mems initialized to None in run_transfo 2019-02-09 16:12:19 +01:00
cfcb95417c fix hasattr 2019-02-08 23:08:53 +01:00
0c1a6f9b1d update readme 2019-02-08 22:32:25 +01:00
1756b5e956 fix loading from Transfo-XL LM model 2019-02-08 22:32:17 +01:00
dadd0c1b13 updating __main__ 2019-02-08 22:31:57 +01:00
102c6b238c adding file cache to __init__ 2019-02-08 22:31:46 +01:00
b80684b23f fixing run openai gpt example 2019-02-08 22:31:32 +01:00
80607874c1 fix layer norm epsilon in OpenAI GPT 2019-02-08 21:49:05 +01:00
7b4b0cf966 logging 2019-02-08 11:16:29 +01:00
4bbb9f2d68 log loss - helpers 2019-02-08 11:14:29 +01:00
5d7e845712 fix model on cuda 2019-02-08 11:08:43 +01:00
eccb2f0163 hot fix 2019-02-08 11:05:20 +01:00
5adc20723b add distant debugging 2019-02-08 11:03:59 +01:00
5ee4f17234 adding option to load on cpu 2019-02-08 10:37:40 +01:00
2dfaf2f227 Merge pull request #261 from deepset-ai/rm_arg_lm_finetuning
removing unused argument eval_batch_size from LM finetuning #256
2019-02-08 10:36:03 +01:00
777459b471 run openai example running 2019-02-08 10:33:14 +01:00
edcb56fd96 more explicit variable name 2019-02-08 09:54:49 +01:00
6bc082da0a updating examples 2019-02-08 00:02:26 +01:00
eb8fda51f4 update docstrings 2019-02-07 23:15:20 +01:00
e77721e4fe renamed examples 2019-02-07 23:15:15 +01:00
009b581316 updated readme 2019-02-07 23:15:05 +01:00
f99f2fb661 docstrings 2019-02-07 17:07:22 +01:00
438db43d46 update adaptive softmax head 2019-02-07 17:07:15 +01:00
c306869ea2 add two transformer xl models 2019-02-07 17:07:03 +01:00
d482e3d79d adding examples for openai and transformer-xl 2019-02-07 17:06:41 +01:00
9c3c24800b split saved model in config & weights 2019-02-07 17:06:17 +01:00
2df41663f1 added test 2019-02-07 17:05:49 +01:00
9aebc711c9 adjust error message related to args.do_eval 2019-02-07 11:49:38 +01:00
4a450b25d5 removing unused argument eval_batch_size from LM finetuning #256 2019-02-07 10:06:38 +01:00
58f0a2745c Merge pull request #258 from BoeingX/master
Fix the undefined variable in squad example
2019-02-06 20:33:18 +01:00
7ac3311e48 Fix the undefined variable in squad example 2019-02-06 19:36:08 +01:00
ed47cb6cba fixing transfo eval script 2019-02-06 16:22:17 +01:00
973926431e fix differencies with tensorflow version (mem cells and adaptive sofmax clusters) 2019-02-06 15:42:29 +01:00
ba9e4eb354 fix unicode in tokenization tests 2019-02-06 00:28:00 +01:00
34bdb7f9cb update circle-ci for python 2.7 and 3.5 2019-02-06 00:25:12 +01:00
848aae49e1 Merge branch 'master' into python_2 2019-02-06 00:13:20 +01:00
448937c00d python 2 compatibility 2019-02-06 00:07:46 +01:00
ba37ddc5ce fix run_lm_modeling example command line 2019-02-06 00:07:08 +01:00
822915142b fix docstring 2019-02-05 16:34:32 +01:00
bd74632687 Merge pull request #251 from Iwontbecreative/active_loss_tok_classif
Only keep the active part mof the loss for token classification
2019-02-05 16:33:45 +01:00
fd223374f0 Merge pull request #208 from Liangtaiwan/mergesquad
Merge run_squad.py and run_squad2.py
2019-02-05 16:15:03 +01:00
d609ba24cb resolving merge conflicts 2019-02-05 16:14:25 +01:00
bde1eeebe0 rename 2019-02-05 16:11:22 +01:00
3ea3b00e59 merge squad example in single example 2019-02-05 16:10:27 +01:00
d8e3bdbb4c moved up to current master 2019-02-05 16:09:39 +01:00
64ce900974 Merge pull request #248 from JoeDumoulin/squad1.1-fix
fix prediction on run-squad.py example
2019-02-05 16:00:51 +01:00
0ad9b239a1 gitignore 2019-02-05 15:43:11 +01:00
e9e77cd3c4 Merge pull request #218 from matej-svejda/master
Fix learning rate problems in run_classifier.py
2019-02-05 15:40:44 +01:00
1579c53635 more explicit notation: num_train_step => num_train_optimization_steps 2019-02-05 15:36:33 +01:00
f3bda2352a Only keep the active part mof the loss for token classification 2019-02-04 11:46:36 -05:00
6179f537a3 clean up tokenization spaces 2019-02-04 17:41:22 +01:00
850da1cc36 strip decoded outputs 2019-02-04 17:35:05 +01:00
01a3966bc6 more options on special tokens 2019-02-04 17:26:25 +01:00
05f961840b logging 2019-02-04 13:06:19 +01:00
aa90e0c36a fix prediction on run-squad.py example 2019-02-01 10:15:44 -08:00
8f8bbd4a4c Merge pull request #244 from deepset-ai/prettify_lm_masking
Avoid confusion of inplace LM masking
2019-02-01 12:17:50 +01:00
e2d53d95b0 Merge pull request #242 from ksurya/argparse
Fix argparse type error
2019-02-01 12:14:55 +01:00
7e0b415ab4 Merge pull request #240 from girishponkiya/patch-1
Minor update in README
2019-02-01 12:14:05 +01:00
ce75b169bd avoid confusion of inplace masking of tokens_a / tokens_b 2019-01-31 11:42:06 +01:00
9bf528877e Update run_squad.py 2019-01-30 15:09:31 -05:00
af2b78601b Update run_squad2.py 2019-01-30 15:08:56 -05:00
0dd2b750ca Minor update in README
Update links to classes in `modeling.py`
2019-01-30 23:49:15 +05:30
5169069997 make examples consistent, revert error in num_train_steps calculation 2019-01-30 11:47:25 +01:00
3a848111e6 update config, docstrings and readme to switch to seperated tokens and position embeddings 2019-01-29 11:00:11 +01:00
98c96fb1a7 splitting position and tokens embeddings in OpenAI GPT - updating tf imports - tests 2019-01-29 10:31:42 +01:00
5456d82311 more versatile model loading 2019-01-29 09:54:18 +01:00
9b2540b5a7 update __init__ 2019-01-29 09:54:08 +01:00
bd3b3aee9c update 2019-01-28 17:47:29 +01:00
a45a9cc0e1 update tests 2019-01-28 17:16:02 +01:00
b12616fd8e updating code organization to fix imports 2019-01-28 17:03:39 +01:00
d77dd62ff8 directly load from TF checkpoints + code cleanup 2019-01-28 16:50:23 +01:00
9c6a48c8c3 fix learning rate/fp16 and warmup problem for all examples 2019-01-27 14:07:24 +01:00
01ff4f82ba learning rate problems in run_classifier.py 2019-01-22 23:40:06 +01:00
4eb2a49d41 Merge run_squad.py and run_squad2.py 2019-01-19 10:18:10 +08:00
0a9d7c7edb Merge pull request #201 from Liangtaiwan/squad2_save_bug
run_squad2 Don't save model if do not train
2019-01-18 09:28:11 +01:00
be9fa192f0 don't save if do not train 2019-01-18 00:41:55 +08:00
9c35c132fa apex LayerNorm 2019-01-17 09:19:19 +01:00
b9c77b98d5 fix transposition in model conversion and memory initialization 2019-01-17 00:33:21 +01:00
f040a43cb3 Merge pull request #199 from davidefiocco/patch-1
(very) minor update to README
2019-01-16 23:51:52 +01:00
35115eaf93 (very) minor update to README 2019-01-16 21:05:24 +01:00
009101de12 fix loading bug and check full conversion of model 2019-01-16 12:16:20 +01:00
fea15cc9f5 update model conversion 2019-01-16 11:54:54 +01:00
a28dfc8659 fix eval for wt103 2019-01-16 11:18:19 +01:00
c03c12687f fix __main__ entry script 2019-01-16 10:55:22 +01:00
8831c68803 fixing various parts of model conversion, loading and weights sharing 2019-01-16 10:31:16 +01:00
bcd4aa8fe0 update evaluation example 2019-01-15 23:32:34 +01:00
a69ec2c722 improved corpus and tokenization conversion - added evaluation script 2019-01-15 23:17:46 +01:00
7d03c53718 conversion working 2019-01-15 16:07:25 +01:00
3a9c88377f adding Transformer XL 2019-01-15 12:59:38 +01:00
647c983530 Merge pull request #193 from nhatchan/20190113_global_step
Fix importing unofficial TF models
2019-01-14 09:44:01 +01:00
4e0cba1053 Merge pull request #191 from nhatchan/20190113_py35_finetune
lm_finetuning compatibility with Python 3.5
2019-01-14 09:40:07 +01:00
c94455651e Merge pull request #190 from nhatchan/20190113_finetune_doc
Fix documentation (missing backslashes)
2019-01-14 09:39:03 +01:00
25eae7b0ae Merge pull request #189 from donglixp/patch-1
[bug fix] args.do_lower_case is always True
2019-01-14 09:38:37 +01:00
cd30565aed Fix importing unofficial TF models
Importing unofficial TF models seems to be working well, at least for me.
This PR resolves #50.
2019-01-14 13:35:40 +09:00
8edc898f63 Fix documentation (missing backslashes)
This PR adds missing backslashes in LM Fine-tuning subsection in README.md.
2019-01-13 21:23:19 +09:00
6c65cb2492 lm_finetuning compatibility with Python 3.5
dicts are not ordered in Python 3.5 or prior, which is a cause of #175.
This PR replaces one with a list, to keep its order.
2019-01-13 21:09:13 +09:00
a2da2b4109 [bug fix] args.do_lower_case is always True
The "default=True" makes args.do_lower_case always True.

```python
parser.add_argument("--do_lower_case",
                        default=True,
                        action='store_true')
```
2019-01-13 19:51:11 +08:00
35becc6d84 Merge pull request #182 from deepset-ai/fix_lowercase_and_saving
add do_lower_case arg and adjust model saving for lm finetuning.
2019-01-11 08:50:13 +01:00
506e5bb0c8 add do_lower_case arg and adjust model saving for lm finetuning. 2019-01-11 08:32:46 +01:00
e485829a41 Merge pull request #174 from abeljim/master
Added Squad 2.0
2019-01-10 23:40:45 +01:00
7e60205bd3 Merge pull request #179 from likejazz/patch-2
Fix it to run properly even if without `--do_train` param.
2019-01-10 23:39:10 +01:00
64326dccfb Fix it to run properly even if without --do_train param.
It was modified similar to `run_classifier.py`, and Fixed to run properly even if without `--do_train` param.
2019-01-10 21:51:39 +09:00
e5c78c6684 update readme and few typos 2019-01-10 01:40:00 +01:00
fa5222c296 update readme 2019-01-10 01:25:28 +01:00
0dd5f55ac8 Merge pull request #172 from WrRan/never_split
Never split some texts.
2019-01-09 13:44:09 +01:00
b3628f117e Added Squad 2.0 2019-01-08 15:13:13 -08:00
ab90d4cddd adding docs and example for OpenAI GPT 2019-01-09 00:12:43 +01:00
dc5df92fa8 added LM head for OpenAI 2019-01-08 17:18:47 +01:00
3cf12b235a added tests + fixed losses 2019-01-08 16:24:23 +01:00
eed51c5bdf add OpenAI GPT 2019-01-08 12:26:58 +01:00
3f60a60eed text in never_split should not lowercase 2019-01-08 13:33:57 +08:00
751beb9e73 never split some text 2019-01-08 10:54:51 +08:00
793dcd236b Merge branch 'master' of https://github.com/huggingface/pytorch-pretrained-BERT into fifth-release 2019-01-07 13:37:55 +01:00
2e4db64cab add do_lower_case tokenizer loading optino in run_squad and ine_tuning examples 2019-01-07 13:06:42 +01:00
c9fd350567 remove default when action is store_true in arguments 2019-01-07 13:01:54 +01:00
93f563b8a8 adding OpenAI GPT 2019-01-07 12:55:36 +01:00
e048c7f1c8 Merge pull request #171 from donglixp/patch-1
LayerNorm initialization
2019-01-07 12:44:46 +01:00
d3d56f9a0b Merge pull request #166 from likejazz/patch-1
Fix error when `bert_model` param is path or url.
2019-01-07 12:40:55 +01:00
766c6b2ce3 Merge pull request #159 from jaderabbit/master
Allow do_eval to be used without do_train and to use the pretrained model in the output folder
2019-01-07 12:31:06 +01:00
77966a43a4 Merge pull request #156 from rodgzilla/cl_args_doc
Adding new pretrained model to the help of the `bert_model` argument.
2019-01-07 12:27:16 +01:00
bcd607542c Merge pull request #145 from wlhgtc/master
Correct the  wrong note
2019-01-07 12:23:05 +01:00
2e8c5c00ec Merge pull request #141 from SinghJasdeep/patch-1
loading saved model when n_classes != 2
2019-01-07 12:21:13 +01:00
2860377021 Merge pull request #134 from rodgzilla/update_doc_pretrained_models
Fixing various class documentations.
2019-01-07 12:06:06 +01:00
c18bdb4433 Merge pull request #124 from deepset-ai/master
Add example for fine tuning BERT language model
2019-01-07 12:03:51 +01:00
d0d9b384f2 LayerNorm initialization
The LayerNorm gamma and beta should be initialized by .fill_(1.0) and .zero_().

reference links:

989e78c412/tensorflow/contrib/layers/python/layers/layers.py (L2298)

989e78c412/tensorflow/contrib/layers/python/layers/layers.py (L2308)
2019-01-07 15:51:33 +08:00
ca4e7aaa72 Fix error when bert_model param is path or url.
Error occurs when `bert_model` param is path or url. Therefore, if it is path, specify the last path to prevent error.
2019-01-05 11:42:54 +09:00
193e2df8ba Remove rogue comment 2019-01-03 13:13:06 +02:00
c64de50ea4 nb_tr_steps is not initialized 2019-01-03 12:34:57 +02:00
b96149a19b Training loss is not initialized if only do_eval is specified 2019-01-03 10:32:10 +02:00
be3b9bcf4d Allow one to use the pretrained model in evaluation when do_train is not selected 2019-01-03 09:02:33 +02:00
186f75342e Adding new pretrained model to the help of the bert_model argument. 2019-01-02 14:00:59 +01:00
e626eecc25 Update modeling.py 2018-12-22 20:26:05 +08:00
99709ee61d loading saved model when n_classes != 2
Required to for: Assertion `t >= 0 && t < n_classes` failed,  if your default number of classes is not 2.
2018-12-20 13:55:47 -08:00
8da280ebbe Setup CI 2018-12-20 16:33:39 -05:00
e5fc98c542 add exemplary training data. update to nvidia apex. refactor 'item -> line in doc' mapping. add warning for unknown word. 2018-12-20 18:30:52 +01:00
7176674849 Fixing various class documentations. 2018-12-20 13:11:17 +01:00
7fb94ab934 Merge pull request #127 from patrick-s-h-lewis/tokenizer-error-on-long-seqs
raises value error for bert tokenizer for long sequences
2018-12-19 10:29:17 +01:00
2feb29c0ff Merge pull request #130 from sodre/use-entry-points
Use entry-points instead of scripts
2018-12-19 10:18:24 +01:00
2c9991496b Merge pull request #128 from sodre/add-license
Add license to source distribution
2018-12-19 10:15:53 +01:00
17595ef2de Merge branch 'master' of https://github.com/deepset-ai/pytorch-pretrained-BERT 2018-12-19 09:22:53 +01:00
67f4dd56a3 update readme for run_lm_finetuning 2018-12-19 09:22:37 +01:00
ecf3ea197e Remove original script 2018-12-19 02:26:08 +00:00
87c1244c7d Convert scripts into entry_points
The recommended approach to create launch scripts is to use entry_points
and console_scripts.

xref: https://packaging.python.org/guides/distributing-packages-using-setuptools/#scripts
2018-12-19 02:26:08 +00:00
b3d86162b0 Add license to source distribution 2018-12-19 01:41:18 +00:00
d57763f582 Fix typos 2018-12-18 19:23:22 -05:00
78cf7b4ab4 added code to raise value error for bert tokenizer for covert_tokens_to_indices 2018-12-18 14:41:30 +00:00
a58361f197 Add example for fine tuning BERT language model (#1)
Adds an example for loading a pre-trained BERT model and fine tune it as a language model (masked tokens & nextSentence) on your target corpus.
2018-12-18 10:32:25 +01:00
786cc41299 Typos in readme 2018-12-17 09:22:18 +01:00
ecc0b54bec Merge pull request #119 from danyaljj/patch-1
Minor README fix
2018-12-14 23:29:47 +01:00
8b1b93947f Minor fix. 2018-12-14 14:10:36 -05:00
8809eb6c93 update readme with information on NVIDIA's apex 2018-12-14 16:59:39 +01:00
e1bfad4846 Merge pull request #112 from huggingface/fourth-release
Fourth release
2018-12-14 15:15:47 +01:00
d821358884 update readme 2018-12-14 15:15:17 +01:00
37378898a2 adding DockerFile 2018-12-14 15:02:32 +01:00
4a4b0e5783 remove logging. basicConfig from library code 2018-12-14 14:46:25 +01:00
ae88eb88a4 set encoding to 'utf-8' in calls to open 2018-12-14 13:48:58 +01:00
e1eab59aac no fp16 on evaluation 2018-12-13 14:54:02 +01:00
087798b7fa fix reloading model for evaluation in examples 2018-12-13 14:48:12 +01:00
0f544625f4 fix swag example for work with apex 2018-12-13 13:35:59 +01:00
0cf88ff084 make examples work without apex 2018-12-13 13:28:00 +01:00
52c53f39d0 clean up apex integration 2018-12-13 13:02:17 +01:00
4946c2c500 run_swag example in readme 2018-12-13 13:02:07 +01:00
d23eed85bb model loading apex modification 2018-12-13 12:53:17 +01:00
1cbb32a542 include version number + comment in setup.py 2018-12-13 12:50:44 +01:00
ce52177638 added version in __init__.py 2018-12-13 12:50:44 +01:00
d3fcec1a3e add saving and loading model in examples 2018-12-13 12:50:44 +01:00
93f335ef86 add pretrained loading from state_dict 2018-12-13 12:48:13 +01:00
b3caec5a56 adding save checkpoint and loading in examples 2018-12-13 12:48:13 +01:00
85fff78c2d compatibility PT 1.0 and 0.4.1 2018-12-13 12:48:13 +01:00
13bf0d4659 fixing Adam weights skip in TF convert script 2018-12-13 12:48:13 +01:00
91aab2a6d3 Merge pull request #116 from FDecaYed/deyuf/fp16_with_apex
Change to use apex for better fp16 and multi-gpu support
2018-12-13 12:32:37 +01:00
32a227f507 Merge pull request #113 from hzhwcmhf/master
fix compatibility with python 3.5.2
2018-12-13 12:15:15 +01:00
ffe9075f48 Merge pull request #96 from rodgzilla/multiple-choice-code
BertForMultipleChoice and Swag dataset example.
2018-12-13 12:05:11 +01:00
3b0a14b761 add fallback path for apex used in modeling.py 2018-12-12 15:05:45 -08:00
dcb50eaa4b Swag example readme section update with gradient accumulation run. 2018-12-12 18:17:46 +01:00
c8ea286048 change to apex for better fp16 and multi-gpu support 2018-12-11 17:13:58 -08:00
485adde742 add pathlib support for file_utils.py on python 3.5 2018-12-11 22:49:19 +08:00
bc659f86ad fix compatibility with python 3.5.2; convert path to str 2018-12-11 20:18:56 +08:00
1df6f26214 Merge branch 'fourth-release' of https://github.com/huggingface/pytorch-pretrained-BERT into fourth-release 2018-12-11 12:20:31 +01:00
770f805ae5 include version number + comment in setup.py 2018-12-11 12:20:22 +01:00
ed3b62cd3b added version in __init__.py 2018-12-11 12:12:08 +01:00
632f2d2df9 Merge branch 'master' into fourth-release 2018-12-11 06:00:53 -05:00
b13abfa9fe add saving and loading model in examples 2018-12-11 11:58:07 +01:00
270fa2f20b add pretrained loading from state_dict 2018-12-11 11:50:38 +01:00
a3a3180c86 Bump up requirements to Python 3.6 2018-12-11 11:29:45 +01:00
e7c0a8ddce Merge pull request #107 from lliimsft/master
Fix optimizer to work with horovod
2018-12-11 05:18:00 -05:00
e622790a93 Merge pull request #91 from rodgzilla/convert-examples-code-improvement
run_classifier.py improvements
2018-12-11 05:12:04 -05:00
df34f22854 Removing the dependency to pandas and using the csv module to load data. 2018-12-10 17:45:23 +01:00
0876b77f7f Change to the README file to add SWAG results. 2018-12-10 15:34:19 +01:00
81e1e2489f Fix optimizer to work with horovod 2018-12-10 02:08:38 -08:00
174cdbccde adding save checkpoint and loading in examples 2018-12-09 17:04:23 -05:00
1db916b5be compatibility PT 1.0 and 0.4.1 2018-12-09 16:57:51 -05:00
68f77303b2 fixing Adam weights skip in TF convert script 2018-12-09 16:17:11 -05:00
a2b6918a11 Merge pull request #101 from davidefiocco/patch-1
Adding --do_lower_case for all uncased BERTs examples
2018-12-09 15:29:31 -05:00
5c858448d3 Merge pull request #94 from rodgzilla/fixing-squad-commentary
Fixing the commentary of the `SquadExample` class.
2018-12-09 15:27:30 -05:00
c9f67e037c Adding --do_lower_case for all uncased BERTs
I had missed those, it should make sense to use them
2018-12-07 20:40:56 +01:00
150f3cd9fa Few typos in README.md 2018-12-06 19:22:07 +01:00
d429c15f25 Removing old code from copy-paste. 2018-12-06 19:19:21 +01:00
4fa7892d64 Wrong line number link to modeling file. 2018-12-06 19:18:29 +01:00
6a26e19ea3 Updating README.md with SWAG example informations. 2018-12-06 19:15:08 +01:00
63c45056aa Finishing the code for the Swag task. 2018-12-06 18:53:05 +01:00
fc5a38ac92 Adding the BertForMultipleChoiceClass. 2018-12-06 18:42:23 +01:00
c45d8ac554 Storing the feature of each choice as a dict for readability. 2018-12-06 16:01:28 +01:00
0812aee2c3 Fixing problems in convert_examples_to_features. 2018-12-06 15:53:07 +01:00
f2b873e995 convert_examples_to_features code and small improvements. 2018-12-06 15:40:47 +01:00
83fdbd6043 Adding read_swag_examples to load the dataset. 2018-12-06 14:02:46 +01:00
7183cded4e SwagExample class. 2018-12-06 13:39:44 +01:00
fa7daa247d Fixing the commentary of the SquadExample class. 2018-12-06 13:14:33 +01:00
a994bf4076 Fixing related to issue #83. 2018-12-05 18:16:30 +01:00
c6d9d5394e Simplifying code for easier understanding. 2018-12-05 17:53:09 +01:00
793262e8ec Removing trailing whitespaces. 2018-12-05 17:52:39 +01:00
3ba5470eb8 Merge pull request #87 from rodgzilla/readme-file-links
Readme file links
2018-12-05 10:41:05 -05:00
0a7c8bdcac Fixing badly formatted links. 2018-12-04 13:43:56 +01:00
3113e967db Adding links to examples files. 2018-12-04 13:40:38 +01:00
04826b0f2c Merge pull request #77 from davidefiocco/patch-1
Correct assignement for logits in classifier example
2018-12-02 13:01:04 +01:00
e60e8a6068 Correct assignement for logits in classifier example
I tried to address https://github.com/huggingface/pytorch-pretrained-BERT/issues/76
should be correct, but there's likely a more efficient way.
2018-12-02 12:38:26 +01:00
063be09b71 Merge pull request #75 from davidefiocco/patch-2
Point typo fix
2018-12-01 01:15:43 +01:00
4450f5ef6b Merge pull request #74 from davidefiocco/patch-1
Update finetuning example in README adding --do_lower_case
2018-12-01 01:15:31 +01:00
dc13e276ee Point typo fix 2018-12-01 01:02:16 +01:00
8a8aa59d8c Update finetuning example adding --do_lower_case
Should be consistent with the fact that an uncased model is used
2018-12-01 01:00:05 +01:00
836b40be82 Merge pull request #72 from NirantK/patch-1
Fix internal hyperlink typo
2018-11-30 23:33:53 +01:00
66d50ca6ae Merge pull request #73 from huggingface/third-release
Third release
2018-11-30 23:10:30 +01:00
f9f3bdd60b update readme 2018-11-30 23:05:18 +01:00
52ff0590ff tup => tpu 2018-11-30 23:01:10 +01:00
511bce58bd update new token classification model 2018-11-30 22:56:02 +01:00
258eb50086 bump up version 2018-11-30 22:55:33 +01:00
d787c6be8c improve docstrings and fix new token classification model 2018-11-30 22:55:26 +01:00
ed302a73f4 add new token classification model 2018-11-30 22:55:03 +01:00
89d47230d7 clean up classification model output 2018-11-30 22:54:53 +01:00
7f7c41b0c1 tests for all model classes with and without labels 2018-11-30 22:54:33 +01:00
be57c8eeef Fix internal hyperlink typo 2018-12-01 02:43:25 +05:30
8c7267f1cf Merge pull request #70 from deepset-ai/fix_lm_loss
fix typo in input for masked lm loss function
2018-11-30 18:23:46 +01:00
7b3bb8c00f fix typo in input for masked lm loss function 2018-11-30 16:52:50 +01:00
257a35134a fix pickle dump in run_squad example 2018-11-30 14:23:09 +01:00
c588453a0f fix run_squad 2018-11-30 14:22:40 +01:00
d6f06c03f4 fixed loading pre-trained tokenizer from directory 2018-11-30 14:09:06 +01:00
532a81d3d6 fixed doc_strings 2018-11-30 13:57:01 +01:00
296f006132 added BertForTokenClassification model 2018-11-30 13:56:53 +01:00
298107fed7 Added new bert models 2018-11-30 13:56:02 +01:00
0541442558 add do_lower_case in examples 2018-11-30 13:47:33 +01:00
3951c2c189 Merge pull request #60 from davidefiocco/patch-1
Updated quick-start example with `BertForMaskedLM`
2018-11-28 14:59:08 +01:00
ec2c339b53 Updated quick-start example with BertForMaskedLM
As `convert_ids_to_tokens` returns a list, the code in the README currently throws an `AssertionError`, so I propose I quick fix.
2018-11-28 14:53:46 +01:00
21f0196412 Merge pull request #58 from lliimsft/master
Bug fix in examples;correct t_total for distributed training;run pred…
2018-11-28 12:39:45 +01:00
0aaedcc02f Bug fix in examples;correct t_total for distributed training;run prediction for full dataset 2018-11-27 01:08:37 -08:00
32167cdf4b remove convert_to_unicode and printable_text from examples 2018-11-26 23:33:22 +01:00
843 changed files with 159261 additions and 12287 deletions

139
.circleci/config.yml Normal file
View File

@ -0,0 +1,139 @@
version: 2
jobs:
run_tests_torch_and_tf:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,torch,testing]
- run: sudo pip install codecov pytest-cov
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ --cov | tee output.txt
- run: codecov
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_torch:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_tf:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,tf-cpu,testing]
- run: python -m pytest -n 8 --dist=loadfile -s ./tests/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
run_tests_custom_tokenizers:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
environment:
RUN_CUSTOM_TOKENIZERS: yes
steps:
- checkout
- run: sudo pip install .[mecab,testing]
- run: python -m pytest -sv ./tests/test_tokenization_bert_japanese.py
run_examples_torch:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install .[sklearn,torch,testing]
- run: sudo pip install -r examples/requirements.txt
- run: python -m pytest -n 8 --dist=loadfile -s ./examples/ | tee output.txt
- store_artifacts:
path: ~/transformers/output.txt
destination: test_output.txt
build_doc:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
steps:
- checkout
- run: sudo pip install .[tf,torch,docs]
- run: cd docs && make html SPHINXOPTS="-W"
- store_artifacts:
path: ./docs/_build
deploy_doc:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
steps:
- add_ssh_keys:
fingerprints:
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
- checkout
- run: sudo pip install .[tf,torch,docs]
- run: ./.circleci/deploy.sh
check_code_quality:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
resource_class: medium
parallelism: 1
steps:
- checkout
# we need a version of isort with https://github.com/timothycrosley/isort/pull/1000
- run: sudo pip install git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
- run: sudo pip install .[tf,torch,quality]
- run: black --check --line-length 119 --target-version py35 examples templates tests src utils
- run: isort --check-only --recursive examples templates tests src utils
- run: flake8 examples templates tests src utils
check_repository_consistency:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
resource_class: small
parallelism: 1
steps:
- checkout
- run: sudo pip install requests
- run: python ./utils/link_tester.py
workflow_filters: &workflow_filters
filters:
branches:
only:
- master
workflows:
version: 2
build_and_test:
jobs:
- check_code_quality
- check_repository_consistency
- run_examples_torch
- run_tests_custom_tokenizers
- run_tests_torch_and_tf
- run_tests_torch
- run_tests_tf
- build_doc
- deploy_doc: *workflow_filters

49
.circleci/deploy.sh Executable file
View File

@ -0,0 +1,49 @@
cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
if [ ! -z "$2" ]
then
if [ "$2" == "master" ]; then
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir/$2/
cp -r _build/html/_static .
elif ssh -oStrictHostKeyChecking=no $doc "[ -d $dir/$2 ]"; then
echo "Directory" $2 "already exists"
scp -r -oStrictHostKeyChecking=no _static/* $doc:$dir/$2/_static/
else
echo "Pushing version" $2
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
fi
else
echo "Pushing stable"
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
# You can find the commit for each tag on https://github.com/huggingface/transformers/tags
deploy_doc "master" master
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
deploy_doc "fc9faa8" v2.0.0
deploy_doc "3ddce1d" v2.1.1
deploy_doc "3616209" v2.2.0
deploy_doc "d0f8b9a" v2.3.0
deploy_doc "6664ea9" v2.4.0
deploy_doc "fb560dc" v2.5.0
deploy_doc "b90745c" v2.5.1
deploy_doc "fbc5bf1" v2.6.0
deploy_doc "6f5a12a" v2.7.0
deploy_doc "11c3257" v2.8.0
deploy_doc "e7cfc1a" v2.9.0
deploy_doc "7cb203f" v2.9.1
deploy_doc "10d7239" v2.10.0
deploy_doc "b42586e" #v2.11.0 Latest stable release

12
.coveragerc Normal file
View File

@ -0,0 +1,12 @@
[run]
source=transformers
omit =
# skip convertion scripts from testing for now
*/convert_*
*/__main__.py
[report]
exclude_lines =
pragma: no cover
raise
except
register_parameter

View File

@ -0,0 +1,22 @@
---
name: "\U0001F5A5 New benchmark"
about: Benchmark a part of this library and share your results
title: "[Benchmark]"
labels: ''
assignees: ''
---
# 🖥 Benchmarking `transformers`
## Benchmark
Which part of `transformers` did you benchmark?
## Set-up
What did you run your benchmarks on? Please include details, such as: CPU, GPU? If using multiple GPUs, which parallelization did you use?
## Results
Put your results here!

View File

@ -0,0 +1,20 @@
---
name: "\U0001F31F New model addition"
about: Submit a proposal/request to implement a new Transformer-based model
title: ''
labels: New model
assignees: ''
---
# 🌟 New model addition
## Model description
<!-- Important information -->
## Open source status
* [ ] the model implementation is available: (give details)
* [ ] the model weights are available: (give details)
* [ ] who are the authors: (mention them, if possible by @gh-username)

52
.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file
View File

@ -0,0 +1,52 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve transformers
title: ''
labels: ''
assignees: ''
---
# 🐛 Bug
## Information
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## To reproduce
Steps to reproduce the behavior:
1.
2.
3.
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:

View File

@ -0,0 +1,25 @@
---
name: "\U0001F680 Feature request"
about: Submit a proposal/request for a new transformers feature
title: ''
labels: ''
assignees: ''
---
# 🚀 Feature request
<!-- A clear and concise description of the feature proposal.
Please provide a link to the paper and code in case they exist. -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request
related to a problem? e.g., I'm always frustrated when [...]. If this is related
to another GitHub issue, please link here too. -->
## Your contribution
<!-- Is there any way that you could help, e.g. by submitting a PR?
Make sure to read the CONTRIBUTING.MD readme:
https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md -->

58
.github/ISSUE_TEMPLATE/migration.md vendored Normal file
View File

@ -0,0 +1,58 @@
---
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
about: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers
to transformers
title: ''
labels: Migration
assignees: ''
---
# 📚 Migration
## Information
<!-- Important information -->
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## Details
<!-- A clear and concise description of the migration issue.
If you have code snippets, please provide it here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
-->
## Environment info
<!-- You can run the command `python transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
<!-- IMPORTANT: which version of the former library do you use? -->
* `pytorch-transformers` or `pytorch-pretrained-bert` version (or branch):
## Checklist
- [ ] I have read the migration guide in the readme.
([pytorch-transformers](https://github.com/huggingface/transformers#migrating-from-pytorch-transformers-to-transformers);
[pytorch-pretrained-bert](https://github.com/huggingface/transformers#migrating-from-pytorch-pretrained-bert-to-transformers))
- [ ] I checked if a related official extension example runs on my machine.

29
.github/ISSUE_TEMPLATE/question-help.md vendored Normal file
View File

@ -0,0 +1,29 @@
---
name: "❓ Questions & Help"
about: Post your general questions on Stack Overflow tagged huggingface-transformers
title: ''
labels: ''
assignees: ''
---
# ❓ Questions & Help
<!-- The GitHub issue tracker is primarly intended for bugs, feature requests,
new models and benchmarks, and migration questions. For all other questions,
we direct you to Stack Overflow (SO) where a whole community of PyTorch and
Tensorflow enthusiast can help you out. Make sure to tag your question with the
right deep learning framework as well as the huggingface-transformers tag:
https://stackoverflow.com/questions/tagged/huggingface-transformers
If your question wasn't answered after a period of time on Stack Overflow, you
can always open a question on GitHub. You should then link to the SO question
that you posted.
-->
## Details
<!-- Description of your issue -->
<!-- You should first ask your question on SO, and only if
you didn't get an answer ask it here on GitHub. -->
**A link to original question on Stack Overflow**:

17
.github/stale.yml vendored Normal file
View File

@ -0,0 +1,17 @@
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 60
# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7
# Issues with these labels will never be considered stale
exemptLabels:
- pinned
- security
# Label to use when marking an issue as stale
staleLabel: wontfix
# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
This issue has been automatically marked as stale because it has not had
recent activity. It will be closed if no further activity occurs. Thank you
for your contributions.
# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: false

19
.github/workflows/github-push.yml vendored Normal file
View File

@ -0,0 +1,19 @@
name: GitHub-hosted runner
on: push
jobs:
check_code_quality:
runs-on: ubuntu-18.04
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
# - name: Install dependencies
# run: |
# pip install .[tf,torch,quality]

32
.github/workflows/github-torch-hub.yml vendored Normal file
View File

@ -0,0 +1,32 @@
name: Torch hub integration
on:
push:
branches:
- "*"
jobs:
torch_hub_integration:
runs-on: ubuntu-latest
steps:
# no checkout necessary here.
- name: Extract branch name
run: echo "::set-env name=BRANCH::${GITHUB_REF#refs/heads/}"
- name: Check branch name
run: echo $BRANCH
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Install dependencies
run: |
pip install torch
pip install numpy tokenizers filelock requests tqdm regex sentencepiece sacremoses packaging
- name: Torch hub list
run: |
python -c "import torch; print(torch.hub.list('huggingface/transformers:$BRANCH'))"
- name: Torch hub help
run: |
python -c "import torch; print(torch.hub.help('huggingface/transformers:$BRANCH', 'modelForSequenceClassification'))"

54
.github/workflows/self-push.yml vendored Normal file
View File

@ -0,0 +1,54 @@
name: Self-hosted runner (push)
on:
push:
branches:
- master
paths:
- "src/**"
- "tests/**"
- ".github/**"
# pull_request:
repository_dispatch:
jobs:
run_tests_torch_and_tf_gpu:
runs-on: self-hosted
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install torch
pip install .[sklearn,testing]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print(torch.cuda.is_available())"
- name: Run all non-slow tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
# TF_GPU_MEMORY_LIMIT: 4096
OMP_NUM_THREADS: 1
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s -v ./tests/

50
.github/workflows/self-scheduled.yml vendored Normal file
View File

@ -0,0 +1,50 @@
name: Self-hosted runner (scheduled)
on:
push:
branches:
- ci_*
repository_dispatch:
schedule:
- cron: "0 0 * * *"
jobs:
run_all_tests_torch_and_tf_gpu:
runs-on: self-hosted
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install .[sklearn,torch,testing]
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print(torch.cuda.is_available())"
- name: Run all tests on GPU
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
USE_CUDA: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -v ./tests/

35
.gitignore vendored
View File

@ -8,6 +8,10 @@ __pycache__/
# C extensions
*.so
# tests and logs
tests/fixtures
logs/
# Distribution / packaging
.Python
build/
@ -116,7 +120,36 @@ dmypy.json
.pyre/
# vscode
.vs
.vscode
# Pycharm
.idea
# TF code
tensorflow_code
tensorflow_code
# Models
models
proc_data
# examples
runs
/runs_old
/wandb
/examples/runs
/examples/**/*.args
# data
/data
serialization_dir
# emacs
*.*~
debug.env
# vim
.*.swp
#ctags
tags

277
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,277 @@
# How to contribute to transformers?
Everyone is welcome to contribute, and we value everybody's contribution. Code
is thus not the only way to help the community. Answering questions, helping
others, reaching out and improving the documentations are immensely valuable to
the community.
It also helps us if you spread the word: reference the library from blog posts
on the awesome projects it made possible, shout out on Twitter every time it has
helped you, or simply star the repo to say "thank you".
## You can contribute in so many ways!
There are 4 ways you can contribute to transformers:
* Fixing outstanding issues with the existing code;
* Implementing new models;
* Contributing to the examples or to the documentation;
* Submitting issues related to bugs or desired new features.
*All are equally valuable to the community.*
## Submitting a new issue or feature request
Do your best to follow these guidelines when submitting an issue or a feature
request. It will make it easier for us to come back to you quickly and with good
feedback.
### Did you find a bug?
The transformers are robust and reliable thanks to the users who notify us of
the problems they encounter. So thank you for reporting an issue.
First, we would really appreciate it if you could **make sure the bug was not
already reported** (use the search bar on Github under Issues).
Did not find it? :( So we can act quickly on it, please follow these steps:
* Include your **OS type and version**, the versions of **Python**, **PyTorch** and
**Tensorflow** when applicable;
* A short, self-contained, code snippet that allows us to reproduce the bug in
less than 30s;
* Provide the *full* traceback if an exception is raised.
To get the OS and software versions automatically, you can run the following command:
```bash
transformers-cli env
```
or from the root of the repository the following command:
```bash
python src/transformers/commands/transformers_cli.py env
```
### Do you want to implement a new model?
Awesome! Please provide the following information:
* Short description of the model and link to the paper;
* Link to the implementation if it is open-source;
* Link to the model weights if they are available.
If you are willing to contribute the model yourself, let us know so we can best
guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/templates) folder.
### Do you want a new feature (that is not a model)?
A world-class feature request addresses the following points:
1. Motivation first:
* Is it related to a problem/frustration with the library? If so, please explain
why. Providing a code snippet that demonstrates the problem is best.
* Is it related to something you would need for a project? We'd love to hear
about it!
* Is it something you worked on and think could benefit the community?
Awesome! Tell us what problem it solved for you.
2. Write a *full paragraph* describing the feature;
3. Provide a **code snippet** that demonstrates its future use;
4. In case this is related to a paper, please attach a link;
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
If your issue is well written we're already 80% of the way there by the time you
post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/templates)
folder.
## Start contributing! (Pull Requests)
Before writing code, we strongly advise you to search through the exising PRs or
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
You will need basic `git` proficiency to be able to contribute to
`transformers`. `git` is not the easiest tool to use but it has the greatest
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing:
1. Fork the [repository](https://github.com/huggingface/transformers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
under your GitHub user account.
2. Clone your fork to your local disk, and add the base repository as a remote:
```bash
$ git clone git@github.com:<your Github handle>/transformers.git
$ cd transformers
$ git remote add upstream https://github.com/huggingface/transformers.git
```
3. Create a new branch to hold your development changes:
```bash
$ git checkout -b a-descriptive-name-for-my-changes
```
**do not** work on the `master` branch.
4. Set up a development environment by running the following command in a virtual environment:
```bash
$ pip install -e ".[dev]"
```
(If transformers was already installed in the virtual environment, remove
it with `pip uninstall transformers` before reinstalling it in editable
mode with the `-e` flag.)
Right now, we need an unreleased version of `isort` to avoid a
[bug](https://github.com/timothycrosley/isort/pull/1000):
```bash
$ pip install -U git+git://github.com/timothycrosley/isort.git@e63ae06ec7d70b06df9e528357650281a3d3ec22#egg=isort
```
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
passes:
```bash
$ make test
```
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
```bash
$ make style
```
`transformers` also uses `flake8` to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally:
```bash
$ git add modified_file.py
$ git commit
```
Please write [good commit
messages](https://chris.beams.io/posts/git-commit/).
It is a good idea to sync your copy of the code with the original
repository regularly. This way you can quickly account for changes:
```bash
$ git fetch upstream
$ git rebase upstream/master
```
Push the changes to your account using:
```bash
$ git push -u origin a-descriptive-name-for-my-changes
```
6. Once you are satisfied (**and the checklist below is happy too**), go to the
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
to the project maintainers for review.
7. It's ok if maintainers ask you for changes. It happens to core contributors
too! So everyone can see the changes in the Pull request, work in your local
branch and push the changes to your fork. They will automatically appear in
the pull request.
### Checklist
1. The title of your pull request should be a summary of its contribution;
2. If your pull request adresses an issue, please mention the issue number in
the pull request description to make sure they are linked (and people
consulting the issue know you are working on it);
3. To indicate a work in progress please prefix the title with `[WIP]`. These
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run the slow tests.
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
example.
### Tests
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library:
```bash
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/
```
and for the examples:
```bash
$ pip install -r examples/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
In fact, that's how `make test` and `make test-examples` are implemented!
You can specify a smaller set of tests in order to test only the feature
you're working on.
By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
`yes` to run them. This will download many gigabytes of models — make sure you
have enough disk space and a good Internet connection, or a lot of patience!
```bash
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
Likewise, set the `RUN_CUSTOM_TOKENIZERS` environment variable to `yes` to run
tests for custom tokenizers, which don't run by default either.
🤗 Transformers uses `pytest` as a test runner only. It doesn't use any
`pytest`-specific features in the test suite itself.
This means `unittest` is fully supported. Here's how to run tests with
`unittest`:
```bash
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
```
### Style guide
For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html).
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information.
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)

1
MANIFEST.in Normal file
View File

@ -0,0 +1 @@
include LICENSE

24
Makefile Normal file
View File

@ -0,0 +1,24 @@
.PHONY: quality style test test-examples
# Check that source code meets quality standards
quality:
black --check --line-length 119 --target-version py35 examples templates tests src utils
isort --check-only --recursive examples templates tests src utils
flake8 examples templates tests src utils
# Format source code automatically
style:
black --line-length 119 --target-version py35 examples templates tests src utils
isort --recursive examples templates tests src utils
# Run tests for the library
test:
python -m pytest -n auto --dist=loadfile -s -v ./tests/
# Run tests for examples
test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/

1023
README.md

File diff suppressed because it is too large Load Diff

View File

@ -1,2 +0,0 @@
#!/bin/sh
python -m pytorch_pretrained_bert "$@"

6
codecov.yml Normal file
View File

@ -0,0 +1,6 @@
coverage:
status:
project:
default:
informational: true
patch: off

View File

@ -0,0 +1,23 @@
cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
if [ ! -z "$2" ]
then
echo "Pushing version" $2
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
else
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
deploy_doc "master"
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
deploy_doc "fc9faa8" v2.0.0
deploy_doc "3ddce1d" v2.1.1
deploy_doc "f2f3294" v2.2.0
deploy_doc "d0f8b9a" v2.3.0

View File

@ -0,0 +1,26 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow-cpu \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,26 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
tensorflow-cpu
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
tensorflow
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

19
docs/Makefile Normal file
View File

@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

211
docs/README.md Normal file
View File

@ -0,0 +1,211 @@
# Generating the documentation
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look like before committing for instance). You don't have to commit the built documentation.
---
## Packages installed
Here's an overview of all the packages installed. If you ran the previous command installing all packages from
`requirements.txt`, you do not need to run the following commands.
Building it requires the package `sphinx` that you can
install using:
```bash
pip install -U sphinx
```
You would also need the custom installed [theme](https://github.com/readthedocs/sphinx_rtd_theme) by
[Read The Docs](https://readthedocs.org/). You can install it using the following command:
```bash
pip install sphinx_rtd_theme
```
The third necessary package is the `recommonmark` package to accept Markdown as well as Restructured text:
```bash
pip install recommonmark
```
## Building the documentation
Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder:
```bash
make html
```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
---
**NOTE**
If you are adding/removing elements from the toc-tree or from any structural item, it is recommended to clean the build
directory before rebuilding. Run the following command to clean and build:
```bash
make clean && make html
```
---
It should build the static app that will be available under `/docs/_build/html`
## Adding a new element to the tree (toc-tree)
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it
in the source directory. You can then link it to the toc-tree by putting the filename without the extension.
## Preview the documentation in a pull request
Once you have made your pull request, you can check what the documentation will look like after it's merged by
following these steps:
- Look at the checks at the bottom of the conversation page of your PR (you may need to click on "show all checks" to
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html))
### Adding a new section
A section is a page held in the `Notes` toc-tree on the documentation. Adding a new section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/index.rst` on the correct toc-tree.
### Adding a new model
When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc`.
- Link that file in `./source/index.rst` on the `model_doc` toc-tree.
- Write a short overview of the model:
- Overview with paper & authors
- Paper abstract
- Tips and tricks and how to use it best
- Add the classes that should be linked in the model. This generally includes the configuration, the tokenizer, and
every model of that class (the base model, alongside models with additional heads), both in PyTorch and TensorFlow.
The order is generally:
- Configuration,
- Tokenizer
- PyTorch base model
- PyTorch head models
- TensorFlow base model
- TensorFlow head models
These classes should be added using the RST syntax. Usually as follows:
```
XXXConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXConfig
:members:
```
This will include every public method of the configuration. If for some reason you wish for a method not to be
displayed in the documentation, you can do so by specifying which methods should be in the docs:
```
XXXTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
```
### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
an object using the :obj: syntax: :obj:\`like so\`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`transformers.XXXClass\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned method will be automatically
linked by Sphinx: :func:\`transformers.XXXClass.method\`
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
#### Defining arguments in a method
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The argument should be followed by its type, with its shape if it is a tensor, and a line return.
Another indentation is necessary before writing the description of the argument.
Here's an example showcasing everything so far:
```
Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`transformers.AlbertTokenizer`.
See :func:`transformers.PreTrainedTokenizer.encode` and
:func:`transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__
```
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
```
Example::
# first line of code
# second line
# etc
```
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.
Here's an example for tuple return, comprising several objects:
```
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
Here's an example for a single value return:
```
Returns:
A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
```

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,16 @@
.highlight .c1, .highlight .sd{
color: #999
}
.highlight .nn, .highlight .k, .highlight .s1, .highlight .nb, .highlight .bp, .highlight .kc {
color: #FB8D68;
}
.highlight .kn, .highlight .nv, .highlight .s2, .highlight .ow {
color: #6670FF;
}
.highlight .gp {
color: #FB8D68;
}

View File

@ -0,0 +1,304 @@
/* Our DOM objects */
/* Version control */
.version-button {
background-color: #6670FF;
color: white;
border: none;
padding: 5px;
font-size: 15px;
cursor: pointer;
}
.version-button:hover, .version-button:focus {
background-color: #A6B0FF;
}
.version-dropdown {
display: none;
background-color: #6670FF;
min-width: 160px;
overflow: auto;
font-size: 15px;
}
.version-dropdown a {
color: white;
padding: 3px 4px;
text-decoration: none;
display: block;
}
.version-dropdown a:hover {
background-color: #A6B0FF;
}
.version-show {
display: block;
}
/* Framework selector */
.framework-selector {
display: flex;
flex-direction: row;
justify-content: flex-end;
margin-right: 30px;
}
.framework-selector > button {
background-color: white;
color: #6670FF;
border: 1px solid #6670FF;
padding: 5px;
}
.framework-selector > button.selected{
background-color: #6670FF;
color: white;
border: 1px solid #6670FF;
padding: 5px;
}
/* Copy button */
a.copybtn {
margin: 3px;
}
/* The literal code blocks */
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
color: #6670FF;
}
/* To keep the logo centered */
.wy-side-scroll {
width: auto;
font-size: 20px;
}
/* The div that holds the Hugging Face logo */
.HuggingFaceDiv {
width: 100%
}
/* The research field on top of the toc tree */
.wy-side-nav-search{
padding-top: 0;
background-color: #6670FF;
}
/* The toc tree */
.wy-nav-side{
background-color: #6670FF;
}
/* The selected items in the toc tree */
.wy-menu-vertical li.current{
background-color: #A6B0FF;
}
/* When a list item that does belong to the selected block from the toc tree is hovered */
.wy-menu-vertical li.current a:hover{
background-color: #B6C0FF;
}
/* When a list item that does NOT belong to the selected block from the toc tree is hovered. */
.wy-menu-vertical li a:hover{
background-color: #A7AFFB;
}
/* The text items on the toc tree */
.wy-menu-vertical a {
color: #FFFFDD;
font-family: Calibre-Light, sans-serif;
}
.wy-menu-vertical header, .wy-menu-vertical p.caption{
color: white;
font-family: Calibre-Light, sans-serif;
}
/* The color inside the selected toc tree block */
.wy-menu-vertical li.toctree-l2 a, .wy-menu-vertical li.toctree-l3 a, .wy-menu-vertical li.toctree-l4 a {
color: black;
}
/* Inside the depth-2 selected toc tree block */
.wy-menu-vertical li.toctree-l2.current>a {
background-color: #B6C0FF
}
.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a {
background-color: #C6D0FF
}
/* Inside the depth-3 selected toc tree block */
.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{
background-color: #D6E0FF
}
/* Inside code snippets */
.rst-content dl:not(.docutils) dt{
font-size: 15px;
}
/* Links */
a {
color: #6670FF;
}
/* Content bars */
.rst-content dl:not(.docutils) dt {
background-color: rgba(251, 141, 104, 0.1);
border-right: solid 2px #FB8D68;
border-left: solid 2px #FB8D68;
color: #FB8D68;
font-family: Calibre-Light, sans-serif;
border-top: none;
font-style: normal !important;
}
/* Expand button */
.wy-menu-vertical li.toctree-l2 span.toctree-expand,
.wy-menu-vertical li.on a span.toctree-expand, .wy-menu-vertical li.current>a span.toctree-expand,
.wy-menu-vertical li.toctree-l3 span.toctree-expand{
color: black;
}
/* Max window size */
.wy-nav-content{
max-width: 1200px;
}
/* Mobile header */
.wy-nav-top{
background-color: #6670FF;
}
/* Source spans */
.rst-content .viewcode-link, .rst-content .viewcode-back{
color: #6670FF;
font-size: 110%;
letter-spacing: 2px;
text-transform: uppercase;
}
/* It would be better for table to be visible without horizontal scrolling */
.wy-table-responsive table td, .wy-table-responsive table th{
white-space: normal;
}
.footer {
margin-top: 20px;
}
.footer__Social {
display: flex;
flex-direction: row;
}
.footer__CustomImage {
margin: 2px 5px 0 0;
}
/* class and method names in doc */
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) code.descclassname{
font-family: Calibre, sans-serif;
font-size: 20px !important;
}
/* class name in doc*/
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname{
margin-right: 10px;
font-family: Calibre-Medium, sans-serif;
}
/* Method and class parameters */
.sig-param{
line-height: 23px;
}
/* Class introduction "class" string at beginning */
.rst-content dl:not(.docutils) .property{
font-size: 18px;
color: black;
}
/* FONTS */
body{
font-family: Calibre, sans-serif;
font-size: 16px;
}
h1 {
font-family: Calibre-Thin, sans-serif;
font-size: 70px;
}
h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
font-family: Calibre-Medium, sans-serif;
}
@font-face {
font-family: Calibre-Medium;
src: url(./Calibre-Medium.otf);
font-weight:400;
}
@font-face {
font-family: Calibre;
src: url(./Calibre-Regular.otf);
font-weight:400;
}
@font-face {
font-family: Calibre-Light;
src: url(./Calibre-Light.ttf);
font-weight:400;
}
@font-face {
font-family: Calibre-Thin;
src: url(./Calibre-Thin.otf);
font-weight:400;
}
/**
* Nav Links to other parts of huggingface.co
*/
div.menu {
position: absolute;
top: 0;
right: 0;
padding-top: 20px;
padding-right: 20px;
z-index: 1000;
}
div.menu a {
font-size: 14px;
letter-spacing: 0.3px;
text-transform: uppercase;
color: white;
-webkit-font-smoothing: antialiased;
background: linear-gradient(0deg, #6671ffb8, #9a66ffb8 50%);
padding: 10px 16px 6px 16px;
border-radius: 3px;
margin-left: 12px;
position: relative;
}
div.menu a:active {
top: 1px;
}
@media (min-width: 768px) and (max-width: 1750px) {
.wy-breadcrumbs {
margin-top: 32px;
}
}
@media (max-width: 768px) {
div.menu {
display: none;
}
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 7.6 KiB

322
docs/source/benchmarks.rst Normal file
View File

@ -0,0 +1,322 @@
Benchmarks
==========
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found `here <https://github.com/huggingface/transformers/blob/master/notebooks/05-benchmark.ipynb>`__.
How to benchmark 🤗 Transformer models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly benchmark 🤗 Transformer models.
The benchmark classes allow us to measure the `peak memory usage` and `required time` for both
`inference` and `training`.
.. note::
Hereby, `inference` is defined by a single forward pass, and `training` is defined by a single forward pass and backward pass.
The benchmark classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` expect an object of type :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments`, respectively, for instantiation. :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` are data classes and contain all relevant configurations for their corresponding benchmark class.
In the following example, it is shown how a BERT model of type `bert-base-cased` can be benchmarked.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = PyTorchBenchmark(args)
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = TensorFlowBenchmark(args)
Here, three arguments are given to the benchmark argument data classes, namely ``models``, ``batch_sizes``, and ``sequence_lengths``. The argument ``models`` is required and expects a :obj:`list` of model identifiers from the `model hub <https://huggingface.co/models>`__
The :obj:`list` arguments ``batch_sizes`` and ``sequence_lengths`` define the size of the ``input_ids`` on which the model is benchmarked.
There are many more parameters that can be configured via the benchmark argument data classes. For more detail on these one can either directly consult the files
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch) and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow).
Alternatively, running the following shell commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow respectively.
.. code-block::
>>> ## PYTORCH CODE
python examples/benchmarking/run_benchmark.py --help
>>> ## TENSORFLOW CODE
python examples/benchmarking/run_benchmark_tf.py --help
An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
.. code-block::
>>> ## PYTORCH CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.006
bert-base-uncased 8 32 0.006
bert-base-uncased 8 128 0.018
bert-base-uncased 8 512 0.088
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1227
bert-base-uncased 8 32 1281
bert-base-uncased 8 128 1307
bert-base-uncased 8 512 1539
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.005
bert-base-uncased 8 32 0.008
bert-base-uncased 8 128 0.022
bert-base-uncased 8 512 0.105
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1330
bert-base-uncased 8 32 1330
bert-base-uncased 8 128 1330
bert-base-uncased 8 512 1770
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
By default, the `time` and the `required memory` for `inference` are benchmarked.
In the example output above the first two sections show the result corresponding to `inference time` and `inference memory`.
In addition, all relevant information about the computing environment, `e.g.` the GPU type, the system, the library versions, etc... are printed out in the third section under `ENVIRONMENT INFORMATION`.
This information can optionally be saved in a `.csv` file when adding the argument :obj:`save_to_csv=True` to :class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` respectively.
In this case, every section is saved in a separate `.csv` file. The path to each `.csv` file can optionally be defined via the argument data classes.
Instead of benchmarking pre-trained models via their model identifier, `e.g.` `bert-base-uncased`, the user can alternatively benchmark an arbitrary configuration of any available model class.
In this case, a :obj:`list` of configurations must be inserted with the benchmark args as follows.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 128 0.006
bert-base 8 512 0.006
bert-base 8 128 0.018
bert-base 8 512 0.088
bert-384-hid 8 8 0.006
bert-384-hid 8 32 0.006
bert-384-hid 8 128 0.011
bert-384-hid 8 512 0.054
bert-6-lay 8 8 0.003
bert-6-lay 8 32 0.004
bert-6-lay 8 128 0.009
bert-6-lay 8 512 0.044
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1277
bert-base 8 32 1281
bert-base 8 128 1307
bert-base 8 512 1539
bert-384-hid 8 8 1005
bert-384-hid 8 32 1027
bert-384-hid 8 128 1035
bert-384-hid 8 512 1255
bert-6-lay 8 8 1097
bert-6-lay 8 32 1101
bert-6-lay 8 128 1127
bert-6-lay 8 512 1359
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 8 0.005
bert-base 8 32 0.008
bert-base 8 128 0.022
bert-base 8 512 0.106
bert-384-hid 8 8 0.005
bert-384-hid 8 32 0.007
bert-384-hid 8 128 0.018
bert-384-hid 8 512 0.064
bert-6-lay 8 8 0.002
bert-6-lay 8 32 0.003
bert-6-lay 8 128 0.0011
bert-6-lay 8 512 0.074
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1330
bert-base 8 32 1330
bert-base 8 128 1330
bert-base 8 512 1770
bert-384-hid 8 8 1330
bert-384-hid 8 32 1330
bert-384-hid 8 128 1330
bert-384-hid 8 512 1540
bert-6-lay 8 8 1330
bert-6-lay 8 32 1330
bert-6-lay 8 128 1330
bert-6-lay 8 512 1540
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
Again, `inference time` and `required memory` for `inference` are measured, but this time for customized configurations of the :obj:`BertModel` class. This feature can especially be helpful when
deciding for which configuration the model should be trained.
Benchmark best practices
~~~~~~~~~~~~~~~~~~~~~~~~
This section lists a couple of best practices one should be aware of when benchmarking a model.
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
specifies on which device the code should be run by setting the ``CUDA_VISIBLE_DEVICES`` environment variable in the shell, `e.g.` ``export CUDA_VISIBLE_DEVICES=0`` before running the code.
- The option :obj:`no_multi_processing` should only be set to :obj:`True` for testing and debugging. To ensure accurate memory measurement it is recommended to run each memory benchmark in a separate process by making sure :obj:`no_multi_processing` is set to :obj:`True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very useful for the community.
Sharing your benchmark
~~~~~~~~~~~~~~~~~~~~~~
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different settings: using PyTorch, with
and without TorchScript, using TensorFlow, with and without XLA. All of those tests were done across CPUs (except for
TensorFlow XLA) and GPUs.
The approach is detailed in the `following blogpost <https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2>`__ and the results are available `here <https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community `here <https://github.com/huggingface/transformers/blob/master/examples/benchmarking/README.md>`__.

18
docs/source/bertology.rst Normal file
View File

@ -0,0 +1,18 @@
BERTology
---------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are:
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: https://arxiv.org/abs/1905.05950
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to help people access the inner representations, mainly adapted from the great work of Paul Michel (https://arxiv.org/abs/1905.10650):
* accessing all the hidden-states of BERT/GPT/GPT-2,
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: `bertology.py <https://github.com/huggingface/transformers/blob/master/examples/bertology/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE.

197
docs/source/conf.py Normal file
View File

@ -0,0 +1,197 @@
# -*- coding: utf-8 -*-
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
sys.path.insert(0, os.path.abspath('../../src'))
# -- Project information -----------------------------------------------------
project = u'transformers'
copyright = u'2020, huggingface'
author = u'huggingface'
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'3.0.0'
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.coverage',
'sphinx.ext.napoleon',
'recommonmark',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton'
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
# source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
# Remove the prompt when copying examples
copybutton_prompt_text = ">>> "
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {
'analytics_id': 'UA-83738774-2'
}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}
# This must be the name of an image file (path relative to the configuration
# directory) that is the favicon of the docs. Modern browsers use this as
# the icon for tabs, windows and bookmarks. It should be a Windows-style
# icon file (.ico).
html_favicon = 'favicon.ico'
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'transformersdoc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'transformers.tex', u'transformers Documentation',
u'huggingface', 'manual'),
]
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'transformers', u'transformers Documentation',
[author], 1)
]
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'transformers', u'transformers Documentation',
author, 'transformers', 'One line description of project.',
'Miscellaneous'),
]
# -- Options for Epub output -------------------------------------------------
# Bibliographic Dublin Core info.
epub_title = project
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''
# A unique identification for the text.
#
# epub_uid = ''
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
def setup(app):
app.add_css_file('css/huggingface.css')
app.add_css_file('css/code-snippets.css')
app.add_js_file('js/custom.js')
# -- Extension configuration -------------------------------------------------

1
docs/source/contributing.md Symbolic link
View File

@ -0,0 +1 @@
../../CONTRIBUTING.md

View File

@ -0,0 +1,133 @@
Converting Tensorflow Checkpoints
================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models than be loaded using the ``from_pretrained`` methods of the library.
.. note::
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**)
available in any transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_bert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ , `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\ ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\ ``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install tensorflow``\ ). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
.. code-block:: shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the `convert_albert_original_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>`_ script.
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``\ ) and the accompanying configuration file (\ ``albert_config.json``\ ), then creates and saves a PyTorch model. To run this conversion you will need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
.. code-block:: shell
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\ )
.. code-block:: shell
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
OpenAI GPT-2
^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here <https://github.com/openai/gpt-2>`__\ )
.. code-block:: shell
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here <https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
XLNet
^^^^^
Here is an example of the conversion process for a pre-trained XLNet model:
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
XLM
^^^
Here is an example of the conversion process for a pre-trained XLM model:
.. code-block:: shell
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]

1
docs/source/examples.md Symbolic link
View File

@ -0,0 +1 @@
../../examples/README.md

BIN
docs/source/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

238
docs/source/glossary.rst Normal file
View File

@ -0,0 +1,238 @@
Glossary
^^^^^^^^
General terms
-------------
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task taht combines texts with another kind of inputs (for instance images).
- NLG: natural language generation, all tasks related to generating text ( for instance talk with transformers,
translation)
- NLP: natural language processing, a generic way to say "deal with texts".
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words)
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
Model inputs
------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
.. _input-ids:
Input IDs
~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
::
>>> tokenized_sequence = tokenizer.tokenize(sequence)
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-dash is
added for "RA" and "M":
::
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of
`huggingface/tokenizers <https://github.com/huggingface/tokenizers>`__ for peak performance.
::
>>> encoded_sequence = tokenizer(sequence)["input_ids"]
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
::
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
Note that the tokenizer automatically adds "special tokens" (if the associated model rely on them) which are special
IDs the model sometimes uses. If we decode the previous sequence of ids,
::
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
we will see
::
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
because this is the way a :class:`~transformers.BertModel` is going to expect its inputs.
.. _attention-mask:
Attention mask
~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the
model which tokens should be attended to, and which should not.
For example, consider these two sequences:
::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
The encoded versions have different lengths:
::
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
Therefore, we can't be put then together in a same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
::
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
::
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating
the position of the padded indices so that the model does not attend to them. For the
:class:`~transformers.BertTokenizer`, :obj:`1` indicate a value that should be attended to while :obj:`0` indicate
a padded value. This attention mask is in the dictionary returned by the tokenizer under the key "attention_mask":
::
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
.. _token-type-ids:
Token Type IDs
~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
be encoded in the same input IDs. They are usually separated by special tokens, such as the classifier and separator
tokens. For example, the BERT model builds its two sequence input as such:
::
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences as two arguments (and
not a list like before) like this:
::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
which will return:
::
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
This is enough for some models to understand where one sequence ends and where another begins. However, other models
such as BERT have an additional mechanism, which are the token type IDs (also called segment IDs). They are a binary
mask identifying the different sequences in the model.
The tokenizer returns in the dictionary under the key "token_type_ids":
::
>>> encoded_dict['token_type_ids']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The first sequence, the "context" used for the question, has all its tokens represented by :obj:`0`, whereas the
question has all its tokens represented by :obj:`1`. Some models, like :class:`~transformers.XLNetModel` use an
additional token represented by a :obj:`2`.
.. _position-ids:
Position IDs
~~~~~~~~~~~~
The position IDs are used by the model to identify which token is at which position. Contrary to RNNs that have the
position of each token embedded within them, transformers are unaware of the position of each token. The position
IDs are created for this purpose.
They are an optional parameter. If no position IDs are passed to the model, they are automatically created as absolute
positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models
use other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
.. _feed-forward-chunking:
Feed Forward Chunking
~~~~~~~~~~~~~~~~~~~~~
In transformers two feed forward layers usually follows the self attention layer in each residual attention block.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g.,
for ``bert-base-uncased``).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with
``n = sequence_length``, which trades increased computation time against reduced memory use, but yields a
mathematically **equivalent** result.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If ``chunk_size`` is set to 0, no feed forward chunking is done.

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

199
docs/source/index.rst Normal file
View File

@ -0,0 +1,199 @@
Transformers
================================================================================================================================================
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
Features
---------------------------------------------------
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
State-of-the-art NLP for everyone:
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages
Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
Contents
---------------------------------
The documentation is organized in five parts:
- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
and a glossary.
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
transformers model
- **PACKAGE REFERENCE** contains the documentation of each public class and function.
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
conversion utilities for the following models:
1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep
Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei
Chang, Kenton Lee, and Kristina Toutanova.
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language
Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik
Narasimhan, Tim Salimans, and Ilya Sutskever.
3. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are
Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford, Jeffrey Wu,
Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.
4. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov.
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized
Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang, Zihang
Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, and Quoc V. Le.
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual
Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with
the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle
Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin
Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together
with the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
<https://arxiv.org/abs/1910.01108>`_ by Victor Sanh, Lysandre Debut, and Thomas Wolf. The same method has been
applied to compress GPT2 into
`DistilGPT2 <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
9. `CTRL <https://github.com/pytorch/fairseq/tree/master/examples/ctrl>`_ (from Salesforce), released together with the
paper `CTRL: A Conditional Transformer Language Model for Controllable Generation
<https://www.github.com/salesforce/ctrl>`_ by Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong,
and Richard Socher.
10. `CamemBERT <https://huggingface.co/transformers/model_doc/camembert.html>`_ (from FAIR, Inria, Sorbonne Université)
released together with the paper `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`_ by
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la
Clergerie, Djame Seddah, and Benoît Sagot.
11. `ALBERT <https://github.com/google-research/ALBERT>`_ (from Google Research), released together with the paper
`ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut.
12. `T5 <https://github.com/google-research/text-to-text-transfer-transformer>`_ (from Google) released with the paper
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
<https://arxiv.org/abs/1910.10683>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu.
13. `XLM-RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_ (from Facebook AI), released together
with the paper `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_ by
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard
Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov.
14. `MMBT <https://github.com/facebookresearch/mmbt/>`_ (from Facebook), released together with the paper a `Supervised
Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/pdf/1909.02950.pdf>`_ by Douwe Kiela,
Suvrat Bhooshan, Hamed Firooz, and Davide Testuggine.
15. `FlauBERT <https://github.com/getalp/Flaubert>`_ (from CNRS) released with the paper `FlauBERT: Unsupervised
Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_ by Hang Le, Loïc Vial, Jibril Frej,
Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, and
Didier Schwab.
16. `BART <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_ (from Facebook) released with the paper
`BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/pdf/1910.13461.pdf>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer.
17. `ELECTRA <https://github.com/google-research/electra>`_ (from Google Research/Stanford University) released with
the paper `ELECTRA: Pre-training text encoders as discriminators rather than generators
<https://arxiv.org/abs/2003.10555>`_ by Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning.
18. `DialoGPT <https://github.com/microsoft/DialoGPT>`_ (from Microsoft Research) released with the paper `DialoGPT:
Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_ by
Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu,
and Bill Dolan.
19. `Reformer <https://github.com/google/trax/tree/master/trax/models/reformer>`_ (from Google Research) released with
the paper `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ by Nikita Kitaev, Łukasz
Kaiser, and Anselm Levskaya.
20. `MarianMT <https://marian-nmt.github.io/>`_ (developed by the Microsoft Translator Team) machine translation models
trained using `OPUS <http://opus.nlpl.eu/>`_ pretrained_models data by Jörg Tiedemann.
21. `Longformer <https://github.com/allenai/longformer>`_ (from AllenAI) released with the paper `Longformer: The
Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan.
22. `Other community models <https://huggingface.co/models>`_, contributed by the `community
<https://huggingface.co/users>`_.
.. toctree::
:maxdepth: 2
:caption: Get started
quicktour
installation
philosophy
glossary
.. toctree::
:maxdepth: 2
:caption: Using 🤗 Transformers
task_summary
model_summary
preprocessing
training
model_sharing
multilingual
.. toctree::
:maxdepth: 2
:caption: Advanced guides
pretrained_models
examples
notebooks
converting_tensorflow_models
migration
torchscript
contributing
.. toctree::
:maxdepth: 2
:caption: Research
bertology
benchmarks
.. toctree::
:maxdepth: 2
:caption: Package Reference
main_classes/configuration
main_classes/model
main_classes/tokenizer
main_classes/pipelines
main_classes/optimizer_schedules
main_classes/processors
model_doc/auto
model_doc/encoderdecoder
model_doc/bert
model_doc/gpt
model_doc/transformerxl
model_doc/gpt2
model_doc/xlm
model_doc/xlnet
model_doc/roberta
model_doc/distilbert
model_doc/ctrl
model_doc/camembert
model_doc/albert
model_doc/xlmroberta
model_doc/flaubert
model_doc/bart
model_doc/t5
model_doc/electra
model_doc/dialogpt
model_doc/reformer
model_doc/marian
model_doc/longformer
model_doc/retribert
model_doc/mobilebert

102
docs/source/installation.md Normal file
View File

@ -0,0 +1,102 @@
# Installation
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available)
and/or [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific
install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with
```bash
pip install transformers[tf-cpu]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
It should download a pretrained model then print something like
```bash
[{'label': 'NEGATIVE', 'score': 0.9991129040718079}]
```
(Note that TensorFlow will print additional stuff before that last statement.)
## Installing from source
To install from source, clone the repository and install with the following commands:
``` bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
```
Again, you can run
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
to check 🤗 Transformers is properly installed.
## Caching models
This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with
`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the PyTorch
cache home followed by ``/transformers/`` (even if you don't have PyTorch installed). This is (by order of priority):
* shell environment variable ``ENV_TORCH_HOME``
* shell environment variable ``ENV_XDG_CACHE_HOME`` + ``/torch/``
* default: ``~/.cache/torch/``
So if you don't have any specific environment variable set, the cache directory will be at
``~/.cache/torch/transformers/``.
**Note:** If you have set a shell enviromnent variable for one of the predecessors of this library
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
enviromnent variable for ``TRANSFORMERS_CACHE``.
### Note on model downloads (Continuous Integration or large-scale deployments)
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch or
TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!

View File

@ -0,0 +1,10 @@
Configuration
----------------------------------------------------
The base class ``PretrainedConfig`` implements the common methods for loading/saving a configuration either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PretrainedConfig``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PretrainedConfig
:members:

View File

@ -0,0 +1,27 @@
Models
----------------------------------------------------
The base class ``PreTrainedModel`` implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PreTrainedModel`` also implements a few methods which are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
``PreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedModel
:members:
``Helper Functions``
~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
``TFPreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFPreTrainedModel
:members:

View File

@ -0,0 +1,71 @@
Optimizer
----------------------------------------------------
The ``.optimization`` module provides:
- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
- a gradient accumulation class to accumulate the gradients of multiple batches
``AdamW``
~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamW
:members:
``AdamWeightDecay``
~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay
.. autofunction:: transformers.create_optimizer
Schedules
----------------------------------------------------
Learning Rate Schedules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.get_constant_schedule
.. autofunction:: transformers.get_constant_schedule_with_warmup
.. image:: /imgs/warmup_constant_schedule.png
:target: /imgs/warmup_constant_schedule.png
:alt:
.. autofunction:: transformers.get_cosine_schedule_with_warmup
.. image:: /imgs/warmup_cosine_schedule.png
:target: /imgs/warmup_cosine_schedule.png
:alt:
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
:target: /imgs/warmup_cosine_hard_restarts_schedule.png
:alt:
.. autofunction:: transformers.get_linear_schedule_with_warmup
.. image:: /imgs/warmup_linear_schedule.png
:target: /imgs/warmup_linear_schedule.png
:alt:
``Warmup``
~~~~~~~~~~~~~~~~
.. autoclass:: transformers.WarmUp
:members:
Gradient Strategies
----------------------------------------------------
``GradientAccumulator``
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GradientAccumulator

View File

@ -0,0 +1,73 @@
Pipelines
----------------------------------------------------
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most
of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering.
There are two categories of pipeline abstractions to be aware about:
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline`
or :class:`~transformers.QuestionAnsweringPipeline`
The pipeline abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
other pipeline but requires an additional argument which is the `task`.
.. autofunction:: transformers.pipeline
The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parent class: Pipeline
=========================================
.. autoclass:: transformers.Pipeline
:members: predict, transform, save_pretrained
TokenClassificationPipeline
==========================================
.. autoclass:: transformers.TokenClassificationPipeline
NerPipeline
==========================================
This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined above. Please refer to that pipeline for
documentation and usage examples.
FillMaskPipeline
==========================================
.. autoclass:: transformers.FillMaskPipeline
FeatureExtractionPipeline
==========================================
.. autoclass:: transformers.FeatureExtractionPipeline
TextClassificationPipeline
==========================================
.. autoclass:: transformers.TextClassificationPipeline
QuestionAnsweringPipeline
==========================================
.. autoclass:: transformers.QuestionAnsweringPipeline
SummarizationPipeline
==========================================
.. autoclass:: transformers.SummarizationPipeline
TextGenerationPipeline
==========================================
.. autoclass:: transformers.TextGenerationPipeline

View File

@ -0,0 +1,153 @@
Processors
----------------------------------------------------
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
examples that can be fed to a model.
Processors
~~~~~~~~~~~~~~~~~~~~~
All processors follow the same architecture which is that of the
:class:`~transformers.data.processors.utils.DataProcessor`. The processor returns a list
of :class:`~transformers.data.processors.utils.InputExample`. These
:class:`~transformers.data.processors.utils.InputExample` can be converted to
:class:`~transformers.data.processors.utils.InputFeatures` in order to be fed to the model.
.. autoclass:: transformers.data.processors.utils.DataProcessor
:members:
.. autoclass:: transformers.data.processors.utils.InputExample
:members:
.. autoclass:: transformers.data.processors.utils.InputFeatures
:members:
GLUE
~~~~~~~~~~~~~~~~~~~~~
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
`GLUE: A multi-task benchmark and analysis platform for natural language understanding <https://openreview.net/pdf?id=rJ4km2R5t7>`__
This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched),
CoLA, SST2, STSB, QQP, QNLI, RTE and WNLI.
Those processors are:
- :class:`~transformers.data.processors.utils.MrpcProcessor`
- :class:`~transformers.data.processors.utils.MnliProcessor`
- :class:`~transformers.data.processors.utils.MnliMismatchedProcessor`
- :class:`~transformers.data.processors.utils.Sst2Processor`
- :class:`~transformers.data.processors.utils.StsbProcessor`
- :class:`~transformers.data.processors.utils.QqpProcessor`
- :class:`~transformers.data.processors.utils.QnliProcessor`
- :class:`~transformers.data.processors.utils.RteProcessor`
- :class:`~transformers.data.processors.utils.WnliProcessor`
Additionally, the following method can be used to load values from a data file and convert them to a list of
:class:`~transformers.data.processors.utils.InputExample`.
.. automethod:: transformers.data.processors.glue.glue_convert_examples_to_features
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the `run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
XNLI
~~~~~~~~~~~~~~~~~~~~~
`The Cross-Lingual NLI Corpus (XNLI) <https://www.nyu.edu/projects/bowman/xnli/>`__ is a benchmark that evaluates
the quality of cross-lingual text representations.
XNLI is crowd-sourced dataset based on `MultiNLI <http://www.nyu.edu/projects/bowman/multinli/>`: pairs of text are labeled with textual entailment
annotations for 15 different languages (including both high-resource language such as English and low-resource languages such as Swahili).
It was released together with the paper
`XNLI: Evaluating Cross-lingual Sentence Representations <https://arxiv.org/abs/1809.05053>`__
This library hosts the processor to load the XNLI data:
- :class:`~transformers.data.processors.utils.XnliProcessor`
Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
An example using these processors is given in the
`run_xnli.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
SQuAD
~~~~~~~~~~~~~~~~~~~~~
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that evaluates
the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version (v1.1) was released together with the paper
`SQuAD: 100,000+ Questions for Machine Comprehension of Text <https://arxiv.org/abs/1606.05250>`__. The second version (v2.0) was released alongside
the paper `Know What You Don't Know: Unanswerable Questions for SQuAD <https://arxiv.org/abs/1806.03822>`__.
This library hosts a processor for each of the two versions:
Processors
^^^^^^^^^^^^^^^^^^^^^^^^^
Those processors are:
- :class:`~transformers.data.processors.utils.SquadV1Processor`
- :class:`~transformers.data.processors.utils.SquadV2Processor`
They both inherit from the abstract class :class:`~transformers.data.processors.utils.SquadProcessor`
.. autoclass:: transformers.data.processors.squad.SquadProcessor
:members:
Additionally, the following method can be used to convert SQuAD examples into :class:`~transformers.data.processors.utils.SquadFeatures`
that can be used as model inputs.
.. automethod:: transformers.data.processors.squad.squad_convert_examples_to_features
These processors as well as the aforementionned method can be used with files containing the data as well as with the `tensorflow_datasets` package.
Examples are given below.
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example using the processors as well as the conversion method using data files:
Example::
# Loading a V2 processor
processor = SquadV2Processor()
examples = processor.get_dev_examples(squad_v2_data_dir)
# Loading a V1 processor
processor = SquadV1Processor()
examples = processor.get_dev_examples(squad_v1_data_dir)
features = squad_convert_examples_to_features(
examples=examples,
tokenizer=tokenizer,
max_seq_length=max_seq_length,
doc_stride=args.doc_stride,
max_query_length=max_query_length,
is_training=not evaluate,
)
Using `tensorflow_datasets` is as easy as using a data file:
Example::
# tensorflow_datasets only handle Squad V1.
tfds_examples = tfds.load("squad")
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)
features = squad_convert_examples_to_features(
examples=examples,
tokenizer=tokenizer,
max_seq_length=max_seq_length,
doc_stride=args.doc_stride,
max_query_length=max_query_length,
is_training=not evaluate,
)
Another example using these processors is given in the
`run_squad.py <https://github.com/huggingface/transformers/blob/master/examples/question-answering/run_squad.py>`__ script.

View File

@ -0,0 +1,40 @@
Tokenizer
----------------------------------------------------
A tokenizer is in charge of preparing the inputs for a model. The library comprise tokenizers for all the models. Most of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the Rust library `tokenizers`. The "Fast" implementations allows (1) a significant speed-up in particular when doing batched tokenization and (2) additional methods to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token). Currently no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa and XLNet models).
The base classes ``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` implements the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and "Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library (downloaded from HuggingFace's AWS S3 repository).
``PreTrainedTokenizer`` and ``PreTrainedTokenizerFast`` thus implements the main methods for using all the tokenizers:
- tokenizing (spliting strings in sub-word token strings), converting tokens strings to ids and back, and encoding/decoding (i.e. tokenizing + convert to integers),
- adding new tokens to the vocabulary in a way that is independant of the underlying structure (BPE, SentencePiece...),
- managing special tokens like mask, beginning-of-sentence, etc tokens (adding them, assigning them to attributes in the tokenizer for easy access and making sure they are not split during tokenization)
``BatchEncoding`` holds the output of the tokenizer's encoding methods (``__call__``, ``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python tokenizer, this class behave just like a standard python dictionary and hold the various model inputs computed by these methodes (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e. backed by HuggingFace tokenizers library), this class provides in addition several advanced alignement methods which can be used to map between the original string (character and words) and the token space (e.g. getting the index of the token comprising a given character or the span of characters corresponding to a given token).
``PreTrainedTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizer
:special-members: __call__
:members:
``PreTrainedTokenizerFast``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizerFast
:special-members: __call__
:members:
``BatchEncoding``
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchEncoding
:members:
``SpecialTokensMixin``
~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SpecialTokensMixin
:members:

122
docs/source/migration.md Normal file
View File

@ -0,0 +1,122 @@
# Migrating from previous packages
## Migrating from pytorch-transformers to 🤗 Transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in 🤗 Transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In 🤗 Transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
```
### Serialization
Breaking change in the `from_pretrained()`method:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*inputs` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute first which can break derived model classes build based on the previous `BertForSequenceClassification` examples. More precisely, the positional arguments `*inputs` provided to `from_pretrained()` are directly forwarded the model `__init__()` method while the keyword arguments `**kwargs` (i) which match configuration class attributes are used to update said attributes (ii) which don't match any configuration class attributes are forwarded to the model `__init__()` method.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
Here is an example:
```python
### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
# Train our model
train(model)
### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
```
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
```python
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, num_training_steps=num_training_steps)
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
optimizer.step()
### In 🤗 Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step()
scheduler.step()
```

View File

@ -0,0 +1,131 @@
ALBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_
by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. It presents
two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT:
- Splitting the embedding matrix into two smaller matrices
- Using repeating layers split among groups
The abstract from the paper is the following:
*Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream
tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE,
RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.*
Tips:
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`_.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
:members:
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
AlbertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
:members:
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
:members:
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
:members:
AlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMultipleChoice
:members:
AlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForTokenClassification
:members:
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
:members:
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
:members:
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
:members:
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
:members:
TFAlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMultipleChoice
:members:
TFAlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForTokenClassification
:members:
TFAlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForQuestionAnswering
:members:

View File

@ -0,0 +1,109 @@
AutoModels
-----------
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the ``from_pretrained`` method.
AutoClasses are here to do this job for you so that you automatically retrieve the relevant model given the name/path
to the pretrained weights/config/vocabulary:
Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will directly create a class of the relevant
architecture (ex: ``model = AutoModel.from_pretrained('bert-base-cased')`` will create a instance of
:class:`~transformers.BertModel`).
``AutoConfig``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoConfig
:members:
``AutoTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoTokenizer
:members:
``AutoModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModel
:members:
``AutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForPreTraining
:members:
``AutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelWithLMHead
:members:
``AutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSequenceClassification
:members:
``AutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForQuestionAnswering
:members:
``AutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
:members:
``TFAutoModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModel
:members:
``TFAutoModelForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForPreTraining
:members:
``TFAutoModelWithLMHead``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelWithLMHead
:members:
``TFAutoModelForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForSequenceClassification
:members:
``TFAutoModelForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForQuestionAnswering
:members:
``TFAutoModelForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForTokenClassification
:members:

View File

@ -0,0 +1,71 @@
Bart
----------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer
Overview
~~~~~~~~~~~~~~~~~~~~~
The Bart model was `proposed <https://arxiv.org/abs/1910.13461>`_ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme, where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_
Implementation Notes:
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use BartTokenizer.encode to get the proper splitting.
- The forward pass of ``BartModel`` will create decoder inputs (using the helper function ``transformers.modeling_bart._prepare_bart_decoder_inputs``) if they are not passed. This is different than some other modeling APIs.
- Model predictions are intended to be identical to the original implementation. This only works, however, if the string you pass to ``fairseq.encode`` starts with a space.
- ``BartForConditionalGeneration.generate`` should be used for conditional generation tasks like summarization, see the example in that docstrings
- Models that load the ``"facebook/bart-large-cnn"`` weights will not have a ``mask_token_id``, or be able to perform mask filling tasks.
BartConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizer
:members:
BartModel
~~~~~~~~~~~~~
.. autoclass:: transformers.BartModel
:members: forward
.. autofunction:: transformers.modeling_bart._prepare_bart_decoder_inputs
BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForSequenceClassification
:members: forward
BartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForQuestionAnswering
:members: forward
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: generate, forward

View File

@ -0,0 +1,172 @@
BERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__
by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a bidirectional transformer
pre-trained using a combination of masked language modeling objective and next sentence prediction
on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The abstract from the paper is the following:
*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations
from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional
representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result,
the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models
for a wide range of tasks, such as question answering and language inference, without substantial task-specific
architecture modifications.*
*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*
Tips:
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- BERT was trained with a masked language modeling (MLM) objective. It is therefore efficient at predicting masked
tokens and at NLU in general, but is not optimal for text generation. Models trained with a causal language
modeling (CLM) objective are better in that regard.
- Alongside MLM, BERT was trained using a next sentence prediction (NSP) objective using the [CLS] token as a sequence
approximate. The user may use this token (the first token in a sequence built with special tokens) to get a sequence
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
the [CLS] token.
The original code can be found `here <https://github.com/google-research/bert>`_.
BertConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertConfig
:members:
BertTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizerFast
:members:
BertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertModel
:members:
BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForPreTraining
:members:
BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMaskedLM
:members:
BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForNextSentencePrediction
:members:
BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForSequenceClassification
:members:
BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMultipleChoice
:members:
BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForTokenClassification
:members:
BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForQuestionAnswering
:members:
TFBertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertModel
:members:
TFBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForPreTraining
:members:
TFBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMaskedLM
:members:
TFBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForNextSentencePrediction
:members:
TFBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForSequenceClassification
:members:
TFBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForMultipleChoice
:members:
TFBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForTokenClassification
:members:
TFBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForQuestionAnswering
:members:

View File

@ -0,0 +1,119 @@
CamemBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__
by Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model
trained on 138GB of French text.
The abstract from the paper is the following:
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success,
most available models have either been trained on English data or on the concatenation of data in multiple
languages. This makes practical use of such models --in all languages except English-- very limited. Aiming
to address this issue for French, we release CamemBERT, a French version of the Bi-directional Encoders for
Transformers (BERT). We measure the performance of CamemBERT compared to multilingual models in multiple
downstream tasks, namely part-of-speech tagging, dependency parsing, named-entity recognition, and natural
language inference. CamemBERT improves the state of the art for most of the tasks considered. We release the
pretrained model for CamemBERT hoping to foster research and downstream applications for French NLP.*
Tips:
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`_.
CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertConfig
:members:
CamembertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
CamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertModel
:members:
CamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMaskedLM
:members:
CamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForSequenceClassification
:members:
CamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMultipleChoice
:members:
CamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForTokenClassification
:members:
CamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForQuestionAnswering
:members:
TFCamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertModel
:members:
TFCamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMaskedLM
:members:
TFCamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForSequenceClassification
:members:
TFCamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForTokenClassification
:members:
TFCamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForQuestionAnswering
:members:

View File

@ -0,0 +1,80 @@
CTRL
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_
by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
The abstract from the paper is the following:
*Large-scale language models show promising text generation capabilities, but users cannot easily control particular
aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model,
trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were
derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning
while providing more explicit control over text generation. These codes also allow CTRL to predict which parts of
the training data are most likely given a sequence. This provides a potential method for analyzing large amounts
of data via model-based source attribution.*
Tips:
- CTRL makes use of control codes to generate text: it requires generations to be started by certain words, sentences
or links to generate coherent text. Refer to the `original implementation <https://github.com/salesforce/ctrl>`__
for more information.
- CTRL is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- CTRL was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows CTRL to generate syntactically coherent text as
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
of this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`_.
CTRLConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLConfig
:members:
CTRLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLTokenizer
:members: save_vocabulary
CTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLModel
:members:
CTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLLMHeadModel
:members:
TFCTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLModel
:members:
TFCTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLLMHeadModel
:members:

View File

@ -0,0 +1,39 @@
DialoGPT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
DialoGPT was proposed in
`DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_
by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
It's a GPT2 Model trained on 147M conversation-like exchanges extracted from Reddit.
The abstract from the paper is the following:
*We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer).
Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings.
We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.
The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.*
Tips:
- DialoGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- DialoGPT was trained with a causal language modeling (CLM) objective on conversational data and is therefore powerful at response generation in open-domain dialogue systems.
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on `DialoGPT's model card <https://huggingface.co/microsoft/DialoGPT-medium>`_.
Training:
In order to train or fine-tune DialoGPT, one can use causal language modeling training.
To cite the official paper:
*We follow the OpenAI GPT-2 to model a multiturn dialogue session
as a long text and frame the generation task as language modeling. We first
concatenate all dialog turns within a dialogue session into a long text
x_1,..., x_N (N is the sequence length), ended by the end-of-text token.*
For more information please confer to the original paper.
DialoGPT's architecture is based on the GPT2 model, so one can refer to GPT2's `docstring <https://huggingface.co/transformers/model_doc/gpt2.html>`_.
The original code can be found `here <https://github.com/microsoft/DialoGPT>`_.

View File

@ -0,0 +1,139 @@
DistilBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The DistilBERT model was proposed in the blog post
`Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__,
and the paper `DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__.
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling Bert base. It has 40% less
parameters than `bert-base-uncased`, runs 60% faster while preserving over 95% of Bert's performances as measured on
the GLUE language understanding benchmark.
The abstract from the paper is the following:
*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
counterparts. While most prior work investigated the use of distillation for building task-specific models, we
leverage knowledge distillation during the pre-training phase and show that it is possible to reduce the size of a
BERT model by 40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage
the inductive biases learned by larger models during pre-training, we introduce a triple loss combining language
modeling, distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train
and we demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative
on-device study.*
Tips:
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
DistilBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertConfig
:members:
DistilBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizer
:members:
DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizerFast
:members:
DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertModel
:members:
DistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMaskedLM
:members:
DistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForSequenceClassification
:members:
DistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMultipleChoice
:members:
DistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForTokenClassification
:members:
DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForQuestionAnswering
:members:
TFDistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertModel
:members:
TFDistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMaskedLM
:members:
TFDistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForSequenceClassification
:members:
TFDistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMultipleChoice
:members:
TFDistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForTokenClassification
:members:
TFDistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
:members:

View File

@ -0,0 +1,148 @@
ELECTRA
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The ELECTRA model was proposed in the paper.
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__.
ELECTRA is a new pre-training approach which trains two transformer models: the generator and the discriminator. The
generator's role is to replace tokens in a sequence, and is therefore trained as a masked language model. The discriminator,
which is the model we're interested in, tries to identify which tokens were replaced by the generator in the sequence.
The abstract from the paper is the following:
*Masked language modeling (MLM) pre-training methods such as BERT corrupt
the input by replacing some tokens with [MASK] and then train a model to
reconstruct the original tokens. While they produce good results when transferred
to downstream NLP tasks, they generally require large amounts of compute to be
effective. As an alternative, we propose a more sample-efficient pre-training task
called replaced token detection. Instead of masking the input, our approach
corrupts it by replacing some tokens with plausible alternatives sampled from a small
generator network. Then, instead of training a model that predicts the original
identities of the corrupted tokens, we train a discriminative model that predicts
whether each token in the corrupted input was replaced by a generator sample
or not. Thorough experiments demonstrate this new pre-training task is more
efficient than MLM because the task is defined over all input tokens rather than
just the small subset that was masked out. As a result, the contextual representations
learned by our approach substantially outperform the ones learned by BERT
given the same model size, data, and compute. The gains are particularly strong
for small models; for example, we train a model on one GPU for 4 days that
outperforms GPT (trained using 30x more compute) on the GLUE natural language
understanding benchmark. Our approach also works well at scale, where it
performs comparably to RoBERTa and XLNet while using less than 1/4 of their
compute and outperforms them when using the same amount of compute.*
Tips:
- ELECTRA is the pre-training approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size -> The embedding size is generally smaller,
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from
their embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no
projection layer is used.
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
contain both the generator and discriminator. The conversion script requires the user to name which model to export
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
available ELECTRA models, however. This means that the discriminator may be loaded in the `ElectraForMaskedLM` model,
and the generator may be loaded in the `ElectraForPreTraining` model (the classification head will be randomly
initialized as it doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`_.
ElectraConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraConfig
:members:
ElectraTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizer
:members:
ElectraTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizerFast
:members:
ElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraModel
:members:
ElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForPreTraining
:members:
ElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMaskedLM
:members:
ElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForSequenceClassification
:members:
ElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForTokenClassification
:members:
ElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForQuestionAnswering
:members:
TFElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraModel
:members:
TFElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForPreTraining
:members:
TFElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMaskedLM
:members:
TFElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForTokenClassification
:members:
TFElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members:

View File

@ -0,0 +1,23 @@
Encoder Decoder Models
------------------------
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
The ``EncoderDecoderModel`` class allows to instantiate a encoder decoder model using the ``from_encoder_decoder_pretrain`` class method taking a pretrained encoder and pretrained decoder model as an input.
The ``EncoderDecoderModel`` is saved using the standard ``save_pretrained()`` method and can also again be loaded using the standard ``from_pretrained()`` method.
An application of this architecture could be *summarization* using two pretrained Bert models as is shown in the paper: `Text Summarization with Pretrained Encoders <https://arxiv.org/abs/1910.13461>`_ by Yang Liu and Mirella Lapata.
``EncoderDecoderConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderConfig
:members:
``EncoderDecoderModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderModel
:members:

View File

@ -0,0 +1,117 @@
FlauBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The FlauBERT model was proposed in the paper
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le et al.
It's a transformer pre-trained using a masked language modeling (MLM) objective (BERT-like).
The abstract from the paper is the following:
*Language models have become a key step to achieve state-of-the art results in many different Natural Language
Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient
way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their
contextualization at the sentence level. This has been widely demonstrated for English using contextualized
representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et
al., 2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large
and heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre
for Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text
classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most
of the time they outperform other pre-training approaches. Different versions of FlauBERT as well as a unified
evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared
to the research community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`_.
FlaubertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertConfig
:members:
FlaubertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertTokenizer
:members:
FlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertModel
:members:
FlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertWithLMHeadModel
:members:
FlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForSequenceClassification
:members:
FlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnsweringSimple
:members:
FlaubertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnswering
:members:
TFFlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertModel
:members:
TFFlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
:members:
TFFlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForSequenceClassification
:members:
TFFlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForMultipleChoice
:members:
TFFlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForTokenClassification
:members:
TFFlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
:members:

View File

@ -0,0 +1,113 @@
OpenAI GPT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training <https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
The abstract from the paper is the following:
*Natural language understanding comprises a wide range of diverse tasks such
as textual entailment, question answering, semantic similarity assessment, and
document classification. Although large unlabeled text corpora are abundant,
labeled data for learning these specific tasks is scarce, making it challenging for
discriminatively trained models to perform adequately. We demonstrate that large
gains on these tasks can be realized by generative pre-training of a language model
on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each
specific task. In contrast to previous approaches, we make use of task-aware input
transformations during fine-tuning to achieve effective transfer while requiring
minimal changes to the model architecture. We demonstrate the effectiveness of
our approach on a wide range of benchmarks for natural language understanding.
Our general task-agnostic model outperforms discriminatively trained models that
use architectures specifically crafted for each task, significantly improving upon the
state of the art in 9 out of the 12 tasks studied.*
Tips:
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
it can be observed in the `run_generation.py` example script.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
Note:
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install
``ftfy`` and ``SpaCy``::
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``, the :class:`transformers.OpenAIGPTTokenizer` will default to tokenize using
BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
worry).
OpenAIGPTConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTConfig
:members:
OpenAIGPTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizer
:members: save_vocabulary
OpenAIGPTTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizerFast
:members:
OpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTModel
:members:
OpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTLMHeadModel
:members:
OpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
:members:
TFOpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTModel
:members:
TFOpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
:members:
TFOpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
:members:

View File

@ -0,0 +1,100 @@
OpenAI GPT2
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT-2 model was proposed in
`Language Models are Unsupervised Multitask Learners <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~40 GB of text data.
The abstract from the paper is the following:
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1]
of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous
words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring
demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X
the parameters and trained on more than 10X the amount of data.*
Tips:
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
it can be observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
of this argument.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2.
The original code can be found `here <https://openai.com/blog/better-language-models/>`_.
GPT2Config
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Config
:members:
GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Tokenizer
:members: save_vocabulary
GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2TokenizerFast
:members:
GPT2Model
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Model
:members:
GPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2LMHeadModel
:members:
GPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2DoubleHeadsModel
:members:
TFGPT2Model
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2Model
:members:
TFGPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2LMHeadModel
:members:
TFGPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2DoubleHeadsModel
:members:

View File

@ -0,0 +1,104 @@
Longformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview
~~~~~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer <https://arxiv.org/pdf/2004.05150.pdf>`_ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
Here the abstract:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.*
The Authors' code can be found `here <https://github.com/allenai/longformer>`_ .
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context.
Most tokens only attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and :math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in `config.attention_window`. Note that `config.attention_window` can be of type ``list`` to define a different :math:`w` for each layer.
A selecetd few tokens attend "globally" to all other tokens, as it is conventionally done for all tokens in *e.g.* `BertSelfAttention`.
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices.
Also note that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally" attending tokens so that global attention is *symmetric*.
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor `global_attention_mask` at run-time appropriately. `Longformer` employs the following logic for `global_attention_mask`: `0` - the token attends "locally", `1` - token attends "globally". For more information please also refer to :func:`~transformers.LongformerModel.forward` method.
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of "locally" attending tokens.
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`_ .
Training
~~~~~~~~~~~~~~~~~~~~
``LongformerForMaskedLM`` is trained the exact same way, ``RobertaForMaskedLM`` is trained and
should be used as follows:
::
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
LongformerConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerConfig
:members:
LongformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizer
:members:
LongformerTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizerFast
:members:
LongformerModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerModel
:members:
LongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMaskedLM
:members:
LongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForSequenceClassification
:members:
LongformerForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMultipleChoice
:members:
LongformerForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForTokenClassification
:members:
LongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering
:members:

View File

@ -0,0 +1,111 @@
MarianMT
----------------------------------------------------
**DISCLAIMER:** If you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@sshleifer. Translations should be similar, but not identical to, output in the test set linked to in each model card.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~
- Each model is about 298 MB on disk, there are 1,000+ models.
- The list of supported language pairs can be found `here <https://huggingface.co/Helsinki-NLP>`__.
- The 1,000+ models were originally trained by `Jörg Tiedemann <https://researchportal.helsinki.fi/en/persons/j%C3%B6rg-tiedemann>`__ using the `Marian <https://marian-nmt.github.io/>`_ C++ library, which supports fast training and translation.
- All models are transformer encoder-decoders with 6 layers in each component. Each model's performance is documented in a model card.
- The 80 opus models that require BPE preprocessing are not supported.
- The modeling code is the same as ``BartForConditionalGeneration`` with a few minor modifications:
- static (sinusoid) positional embeddings (``MarianConfig.static_position_embeddings=True``)
- a new final_logits_bias (``MarianConfig.add_bias_logits=True``)
- no layernorm_embedding (``MarianConfig.normalize_embedding=False``)
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix. (Bart uses <s/>)
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``
Naming
~~~~~~
- All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``
- The language codes used to name models are inconsistent. Two digit codes can usually be found `here <https://developers.google.com/admin-sdk/directory/v1/languages>`_, three digit codes require googling "language code {code}".
- Codes formatted like ``es_AR`` are usually ``code_{region}``. That one is spanish documents from Argentina.
Multilingual Models
~~~~~~~~~~~~~~~~~~~~
All model names use the following format: ``Helsinki-NLP/opus-mt-{src}-{tgt}``:
- if ``src`` is in all caps, the model supports multiple input languages, you can figure out which ones by looking at the model card, or the Group Members `mapping <https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_ .
- if ``tgt`` is in all caps, the model can output multiple languages, and you should specify a language code by prepending the desired output language to the src_text
- You can see a tokenizer's supported language codes in ``tokenizer.supported_language_codes``
Example of translating english to many romance languages, using language codes:
.. code-block:: python
from transformers import MarianMTModel, MarianTokenizer
src_text = [
'>>fr<< this is a sentence in english that we want to translate to french',
'>>pt<< This should go to portuguese',
'>>es<< And this to Spanish'
]
model_name = 'Helsinki-NLP/opus-mt-en-ROMANCE'
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer.prepare_translation_batch(src_text))
tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
# ["c'est une phrase en anglais que nous voulons traduire en français",
# 'Isto deve ir para o português.',
# 'Y esto al español']
Sometimes, models were trained on collections of languages that do not resolve to a group. In this case, _ is used as a separator for src or tgt, as in ``'Helsinki-NLP/opus-mt-en_el_es_fi-en_el_es_fi'``. These still require language codes.
There are many supported regional language codes, like ``>>es_ES<<`` (Spain) and ``>>es_AR<<`` (Argentina), that do not seem to change translations. I have not found these to provide different results than just using ``>>es<<``.
For Example:
- ``Helsinki-NLP/opus-mt-NORTH_EU-NORTH_EU``: translates from all NORTH_EU languages (see `mapping <https://gist.github.com/sshleifer/6d20e7761931b08e73c3219027b97b8a>`_) to all NORTH_EU languages. Use a special language code like ``>>de<<`` to specify output language.
- ``Helsinki-NLP/opus-mt-ROMANCE-en``: translates from many romance languages to english, no codes needed since there is only 1 tgt language.
.. code-block:: python
GROUP_MEMBERS = {
'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo', 'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE', 'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']
}
Code to see available pretrained models:
.. code-block:: python
from transformers.hf_api import HfApi
model_list = HfApi().model_list()
org = "Helsinki-NLP"
model_ids = [x.modelId for x in model_list if x.modelId.startswith(org)]
suffix = [x.split('/')[1] for x in model_ids]
multi_models = [f'{org}/{s}' for s in suffix if s != s.lower()]
MarianConfig
~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianConfig
:members:
MarianTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MarianTokenizer
:members: prepare_translation_batch
MarianMTModel
~~~~~~~~~~~~~
Pytorch version of marian-nmt's transformer.h (c++). Designed for the OPUS-NMT translation checkpoints.
Model API is identical to BartForConditionalGeneration.
Available models are listed at `Model List <https://huggingface.co/models?search=Helsinki-NLP>`__
This class inherits all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
.. autoclass:: transformers.MarianMTModel
:members:

View File

@ -0,0 +1,169 @@
MobileBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The MobileBERT model was proposed in `MobileBERT: a Compact Task-Agnostic BERT
for Resource-Limited Devices <https://arxiv.org/abs/2004.02984>`__
by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. It's a bidirectional transformer
based on the BERT model, which is compressed and accelerated using several approaches.
The abstract from the paper is the following:
*Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds
of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot
be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating
the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied
to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while
equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward
networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated
BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to MobileBERT. Empirical studies show that
MobileBERT is 4.3x smaller and 5.5x faster than BERT_BASE while achieving competitive results on well-known
benchmarks. On the natural language inference tasks of GLUE, MobileBERT achieves a GLUEscore o 77.7
(0.6 lower than BERT_BASE), and 62 ms latency on a Pixel 4 phone. On the SQuAD v1.1/v2.0 question answering task,
MobileBERT achieves a dev F1 score of 90.0/79.2 (1.5/2.1 higher than BERT_BASE).*
Tips:
- MobileBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on
the right rather than the left.
- MobileBERT is similar to BERT and therefore relies on the masked language modeling (MLM) objective.
It is therefore efficient at predicting masked tokens and at NLU in general, but is not optimal for
text generation. Models trained with a causal language modeling (CLM) objective are better in that regard.
The original code can be found `here <https://github.com/google-research/mobilebert>`_.
MobileBertConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertConfig
:members:
MobileBertTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
MobileBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertTokenizerFast
:members:
MobileBertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertModel
:members:
MobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForPreTraining
:members:
MobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMaskedLM
:members:
MobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForNextSentencePrediction
:members:
MobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForSequenceClassification
:members:
MobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForMultipleChoice
:members:
MobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForTokenClassification
:members:
MobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.MobileBertForQuestionAnswering
:members:
TFMobileBertModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertModel
:members:
TFMobileBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForPreTraining
:members:
TFMobileBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMaskedLM
:members:
TFMobileBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForNextSentencePrediction
:members:
TFMobileBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForSequenceClassification
:members:
TFMobileBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForMultipleChoice
:members:
TFMobileBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForTokenClassification
:members:
TFMobileBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFMobileBertForQuestionAnswering
:members:

View File

@ -0,0 +1,114 @@
Reformer
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview
~~~~~~~~~~
The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Here the abstract:
*Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O(L^2) to O(Llog(L)), where L is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.*
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
Axial Positional Encodings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
.. math::
X_{i,j}, \text{ with } i \in \left[1,\ldots, d\right] \text{ and } j \in \left[1,\ldots, n_s\right]
which alone has over 500M parameters to store. Axial positional encodings factorize :math:`X_{i,j}` into two matrices:
.. math::
X^{1}_{i,j}, \text{ with } i \in \left[1,\ldots, d^1\right] \text{ and } j \in \left[1,\ldots, n_s^1\right]
and
.. math::
X^{2}_{i,j}, \text{ with } i \in \left[1,\ldots, d^2\right] \text{ and } j \in \left[1,\ldots, n_s^2\right]
with:
.. math::
d = d^1 + d^2 \text{ and } n_s = n_s^1 \times n_s^2 .
Therefore the following holds:
.. math::
X_{i,j} = \begin{cases}
X^{1}_{i, k}, & \text{if }\ i < d^1 \text{ with } k = j \mod n_s^1 \\
X^{2}_{i - d^1, l}, & \text{if } i \ge d^1 \text{ with } l = \lfloor\frac{j}{n_s^1}\rfloor
\end{cases}
Intuitively, this means that a position embedding vector :math:`x_j \in \mathbb{R}^{d}` is now the composition of two factorized embedding vectors: :math:`x^1_{k, l} + x^2_{l, k}`, where as the ``config.max_embedding_size`` dimension :math:`j` is factorized into :math:`k \text{ and } l`.
This design ensures that each position embedding vector :math:`x_j` is unique.
Using the above example again, axial position encoding with :math:`d^1 = 2^5, d^2 = 2^5, n_s^1 = 2^9, n_s^2 = 2^{10}` can drastically reduced the number of parameters to :math:`2^{14} + 2^{15} \approx 49000` parameters.
In practice, the parameter ``config.axial_pos_embds_dim`` is set to ``list``:math:`(d^1, d^2)` which sum has to be equal to ``config.hidden_size`` and ``config.axial_pos_shape`` is set to ``list``:math:`(n_s^1, n_s^2)` and which product has to be equal to ``config.max_embedding_size`` which during training has to be equal to the ``sequence length`` of the ``input_ids``.
LSH Self Attention
~~~~~~~~~~~~~~~~~~~~
In Locality sensitive hashing (LSH) self attention the key and query projection weights are tied. Therefore, the key query embedding vectors are also tied.
LSH self attention uses the locality sensitive
hashing mechanism proposed in `Practical and Optimal LSH for Angular Distance <https://arxiv.org/abs/1509.02897>`_ to assign each of the tied key query embedding vectors to one of ``config.num_buckets`` possible buckets. The premise is that the more "similar" key query embedding vectors (in terms of *cosine similarity*) are to each other, the more likely they are assigned to the same bucket.
The accuracy of the LSH mechanism can be improved by increasing ``config.num_hashes`` or directly the argument ``num_hashes`` of the forward function so that the output of the LSH self attention better approximates the output of the "normal" full self attention.
The buckets are then sorted and chunked into query key embedding vector chunks each of length ``config.lsh_chunk_length``. For each chunk, the query embedding vectors attend to its key vectors (which are tied to themselves) and to the key embedding vectors of ``config.lsh_num_chunks_before`` previous neighboring chunks and ``config.lsh_num_chunks_after`` following neighboring chunks.
For more information, see the `original Paper <https://arxiv.org/abs/2001.04451>`_ or this great `blog post <https://www.pragmatic.ml/reformer-deep-dive/>`_.
Note that ``config.num_buckets`` can also be factorized into a ``list``:math:`(n_{\text{buckets}}^1, n_{\text{buckets}}^2)`. This way instead of assigning the query key embedding vectors to one of :math:`(1,\ldots, n_{\text{buckets}})` they are assigned to one of :math:`(1-1,\ldots, n_{\text{buckets}}^1-1, \ldots, 1-n_{\text{buckets}}^2, \ldots, n_{\text{buckets}}^1-n_{\text{buckets}}^2)`. This is crucial for very long sequences to save memory.
When training a model from scratch, it is recommended to leave ``config.num_buckets=None``, so that depending on the sequence length a good value for ``num_buckets`` is calculated on the fly. This value will then automatically be saved in the config and should be reused for inference.
Using LSH self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Local Self Attention
~~~~~~~~~~~~~~~~~~~~
Local self attention is essentially a "normal" self attention layer with
key, query and value projections, but is chunked so that in each chunk of length ``config.local_chunk_length`` the query embedding vectors only attends to the key embedding vectors in its chunk and to the key embedding vectors of ``config.local_num_chunks_before`` previous neighboring chunks and ``config.local_num_chunks_after`` following neighboring chunks.
Using Local self attention, the memory and time complexity of the query-key matmul operation can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to :math:`\mathcal{O}(n_s \times \log(n_s))`, which usually represents the memory and time bottleneck in a transformer model, with :math:`n_s` being the sequence length.
Training
~~~~~~~~~~~~~~~~~~~~
During training, we must ensure that the sequence length is set to a value that can be divided by the least common multiple of ``config.lsh_chunk_length`` and ``config.local_chunk_length`` and that the parameters of the Axial Positional Encodings are correctly set as described above. Reformer is very memory efficient so that the model can easily be trained on sequences as long as 64000 tokens.
For training, the ``ReformerModelWithLMHead`` should be used as follows:
::
input_ids = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids)[0]
ReformerConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerConfig
:members:
ReformerTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerTokenizer
:members:
ReformerModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModel
:members:
ReformerModelWithLMHead
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ReformerModelWithLMHead
:members:

View File

@ -0,0 +1,39 @@
RetriBERT
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The RetriBERT model was proposed in the blog post
`Explain Anything Like I'm Five: A Model for Open Domain Long Form Question Answering <https://yjernite.github.io/lfqa.html>`__,
RetriBERT is a small model that uses either a single or pair of Bert encoders with lower-dimension projection for dense semantic indexing of text.
Code to train and use the model can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
RetriBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertConfig
:members:
RetriBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizer
:members:
RetriBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertTokenizerFast
:members:
RetriBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RetriBertModel
:members:

View File

@ -0,0 +1,140 @@
RoBERTa
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The RoBERTa model was proposed in `RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_
by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer,
Veselin Stoyanov. It is based on Google's BERT model released in 2018.
It builds on BERT and modifies key hyperparameters, removing the next-sentence pretraining
objective and training with much larger mini-batches and learning rates.
The abstract from the paper is the following:
*Language model pretraining has led to significant performance gains but careful comparison between different
approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes,
and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication
study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and
training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of
every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These
results highlight the importance of previously overlooked design choices, and raise questions about the source
of recently reported improvements. We release our models and code.*
Tips:
- This implementation is the same as :class:`~transformers.BertModel` with a tiny embeddings tweak as well as a
setup for Roberta pretrained models.
- RoBERTa has the same architecture as BERT, but uses a byte-level BPE as a tokenizer (same as GPT-2) and uses a
different pre-training scheme.
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
- `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
RobertaConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaConfig
:members:
RobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
RobertaTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizerFast
:members: build_inputs_with_special_tokens
RobertaModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaModel
:members:
RobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMaskedLM
:members:
RobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForSequenceClassification
:members:
RobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForMultipleChoice
:members:
RobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForTokenClassification
:members:
RobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaForQuestionAnswering
:members:
TFRobertaModel
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaModel
:members:
TFRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMaskedLM
:members:
TFRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForSequenceClassification
:members:
TFRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForMultipleChoice
:members:
TFRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForTokenClassification
:members:
TFRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFRobertaForQuestionAnswering
:members:

View File

@ -0,0 +1,105 @@
T5
----------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange,
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
Overview
~~~~~~~~~~~~~~~~~~~~~
The T5 model was presented in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/pdf/1910.10683.pdf>`_ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu in
Here the abstract:
*Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice.
In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format.
Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
By combining the insights from our exploration with scale and our new "Colossal Clean Crawled Corpus", we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.*
Tips:
- T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised
and supervised tasks and for which each task is converted into a text-to-text format.
T5 works well on a variety of tasks out-of-the-box by prepending a different prefix to the input corresponding to each task, e.g.: for translation: *translate English to German: ..., summarize: ...*.
For more information about which prefix to use, it is easiest to look into Appendix D of the `paper <https://arxiv.org/pdf/1910.10683.pdf>`_ .
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
Training
~~~~~~~~~~~~~~~~~~~~~
T5 is an encoder-decoder model and converts all NLP problems into a text-to-text format. It is trained using teacher forcing.
This means that for training we always need an input sequence and a target sequence.
The input sequence is fed to the model using ``input_ids``. The target sequence is shifted to the right, *i.e.* prepended by a start-sequence token and fed to the decoder using the `decoder_input_ids`. In teacher-forcing style, the target sequence is then appended by the EOS token and corresponds to the ``labels``. The PAD token is hereby used as the start-sequence token.
T5 can be trained / fine-tuned both in a supervised and unsupervised fashion.
- Unsupervised denoising training
In this setup spans of the input sequence are masked by so-called sentinel tokens (*a.k.a* unique mask tokens)
and the output sequence is formed as a concatenation of the same sentinel tokens and the *real* masked tokens.
Each sentinel token represents a unique mask token for this sentence and should start with ``<extra_id_1>``, ``<extra_id_2>``, ... up to ``<extra_id_100>``. As a default 100 sentinel tokens are available in ``T5Tokenizer``.
*E.g.* the sentence "The cute dog walks in the park" with the masks put on "cute dog" and "the" should be processed as follows:
::
input_ids = tokenizer.encode('The <extra_id_1> walks in <extra_id_2> park', return_tensors='pt')
labels = tokenizer.encode('<extra_id_1> cute dog <extra_id_2> the <extra_id_3> </s>', return_tensors='pt')
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, labels=labels)
- Supervised training
In this setup the input sequence and output sequence are standard sequence to sequence input output mapping.
In translation, *e.g.* the input sequence "The house is wonderful." and output sequence "Das Haus ist wunderbar." should
be processed as follows:
::
input_ids = tokenizer.encode('translate English to German: The house is wonderful. </s>', return_tensors='pt')
labels = tokenizer.encode('Das Haus ist wunderbar. </s>', return_tensors='pt')
# the forward function automatically creates the correct decoder_input_ids
model(input_ids=input_ids, labels=labels)
T5Config
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Config
:members:
T5Tokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Tokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
T5Model
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5Model
:members:
T5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.T5ForConditionalGeneration
:members:
TFT5Model
~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5Model
:members:
TFT5ForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFT5ForConditionalGeneration
:members:

View File

@ -0,0 +1,82 @@
Transformer XL
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The Transformer-XL model was proposed in
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__
by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
It's a causal (uni-directional) transformer with relative positioning (sinusoïdal) embeddings which can reuse
previously computed hidden-states to attend to longer context (memory).
This model also uses adaptive softmax inputs and outputs (tied).
The abstract from the paper is the following:
*Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the
setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency
beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and
a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves
the context fragmentation problem. As a result, Transformer-XL learns dependency that is 80% longer than RNNs and
450% longer than vanilla Transformers, achieves better performance on both short and long sequences, and is up
to 1,800+ times faster than vanilla Transformers during evaluation. Notably, we improve the state-of-the-art results
of bpc/perplexity to 0.99 on enwiki8, 1.08 on text8, 18.3 on WikiText-103, 21.8 on One Billion Word, and 54.5 on
Penn Treebank (without finetuning). When trained only on WikiText-103, Transformer-XL manages to generate reasonably
coherent, novel text articles with thousands of tokens.*
Tips:
- Transformer-XL uses relative sinusoidal positional embeddings. Padding can be done on the left or on the right.
The original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
- Transformer-XL is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`_.
TransfoXLConfig
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLConfig
:members:
TransfoXLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizer
:members: save_vocabulary
TransfoXLTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizerFast
:members:
TransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLModel
:members:
TransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLLMHeadModel
:members:
TFTransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLModel
:members:
TFTransfoXLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTransfoXLLMHeadModel
:members:

View File

@ -0,0 +1,124 @@
XLM
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The XLM model was proposed in `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_
by Guillaume Lample*, Alexis Conneau*. It's a transformer pre-trained using one of the following objectives:
- a causal language modeling (CLM) objective (next token prediction),
- a masked language modeling (MLM) objective (Bert-like), or
- a Translation Language Modeling (TLM) object (extension of Bert's MLM to multiple language inputs)
The abstract from the paper is the following:
*Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding.
In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining.
We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual
data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain
state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI,
our approach pushes the state of the art by an absolute gain of 4.9% accuracy. On unsupervised machine translation,
we obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On
supervised machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming
the previous best approach by more than 4 BLEU. Our code and pretrained models will be made publicly available.*
Tips:
- XLM has many different checkpoints, which were trained using different objectives: CLM, MLM or TLM. Make sure to
select the correct objective for your task (e.g. MLM checkpoints are not suitable for generation).
- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the
`multi-lingual <../multilingual.html>`__ page for more information.
The original code can be found `here <https://github.com/facebookresearch/XLM/>`_.
XLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMConfig
:members:
XLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMModel
:members:
XLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMWithLMHeadModel
:members:
XLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForSequenceClassification
:members:
XLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnsweringSimple
:members:
XLMForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMForQuestionAnswering
:members:
TFXLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMModel
:members:
TFXLMWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMWithLMHeadModel
:members:
TFXLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForSequenceClassification
:members:
TFXLMForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForMultipleChoice
:members:
TFXLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForTokenClassification
:members:
TFXLMForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMForQuestionAnsweringSimple
:members:

View File

@ -0,0 +1,133 @@
XLM-RoBERTa
------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The XLM-RoBERTa model was proposed in `Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__
by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán,
Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019.
It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data.
The abstract from the paper is the following:
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for
a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy
on XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model.
We also present a detailed empirical evaluation of the key factors that are required to achieve these gains,
including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and
low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling
without sacrificing per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE
and XNLI benchmarks. We will make XLM-R code, data, and models publicly available.*
Tips:
- XLM-R is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
not require `lang` tensors to understand which language is used, and should be able to determine the correct
language from the input ids.
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
examples as well as the information relative to the inputs and outputs.
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_.
XLMRobertaConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaConfig
:members:
XLMRobertaTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaModel
:members:
XLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMaskedLM
:members:
XLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForSequenceClassification
:members:
XLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForMultipleChoice
:members:
XLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForTokenClassification
:members:
XLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLMRobertaForQuestionAnswering
:members:
TFXLMRobertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaModel
:members:
TFXLMRobertaForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMaskedLM
:members:
TFXLMRobertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForSequenceClassification
:members:
TFXLMRobertaForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForMultipleChoice
:members:
TFXLMRobertaForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForTokenClassification
:members:
TFXLMRobertaForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLMRobertaForQuestionAnswering
:members:

View File

@ -0,0 +1,141 @@
XLNet
----------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~
The XLNet model was proposed in `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_
by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
XLnet is an extension of the Transformer-XL model pre-trained using an autoregressive method
to learn bidirectional contexts by maximizing the expected likelihood over all permutations
of the input sequence factorization order.
The abstract from the paper is the following:
*With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves
better performance than pretraining approaches based on autoregressive language modeling. However, relying on
corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a
pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive
pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over
all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive
formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model,
into pretraining. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by
a large margin, including question answering, natural language inference, sentiment analysis, and document ranking.*
Tips:
- The specific attention pattern can be controlled at training and test time using the `perm_mask` input.
- Due to the difficulty of training a fully auto-regressive model over various factorization order,
XLNet is pretrained using only a sub-set of the output tokens as target which are selected
with the `target_mapping` input.
- To use XLNet for sequential decoding (i.e. not in fully bi-directional setting), use the `perm_mask` and
`target_mapping` inputs to control the attention span and outputs (see examples in `examples/text-generation/run_generation.py`)
- XLNet is one of the few models that has no sequence length limit.
The original code can be found `here <https://github.com/zihangdai/xlnet/>`_.
XLNetConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetConfig
:members:
XLNetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
XLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetModel
:members:
XLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetLMHeadModel
:members:
XLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForSequenceClassification
:members:
XLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForMultipleChoice
:members:
XLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForTokenClassification
:members:
XLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnsweringSimple
:members:
XLNetForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XLNetForQuestionAnswering
:members:
TFXLNetModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetModel
:members:
TFXLNetLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetLMHeadModel
:members:
TFXLNetForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForSequenceClassification
:members:
TFLNetForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForMultipleChoice
:members:
TFXLNetForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForTokenClassification
:members:
TFXLNetForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFXLNetForQuestionAnsweringSimple
:members:

View File

@ -0,0 +1,209 @@
Model sharing and uploading
===========================
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__.
.. note::
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
Optionally, you can join an existing organization or create a new one.
Prepare your model for uploading
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen in the :doc:`training tutorial <training>`: how to fine-tune a model on a given task. You have probably
done something similar on your task, either using the model directly in your own training loop or using the
:class:`~.transformers.Trainer`/:class:`~.transformers.TFTrainer` class. Let's see how you can share the result on
the `model hub <https://huggingface.co/models>`__.
Basic steps
^^^^^^^^^^^
..
When #5258 is merged, we can remove the need to create the directory.
First, pick a directory with the name you want your model to have on the model hub (its full name will then be
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`) and create it with either
::
mkdir path/to/awesome-name-you-picked
or in python
::
import os
os.makedirs("path/to/awesome-name-you-picked")
then you can save your model and tokenizer with:
::
model.save_pretrained("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Or, if you're using the Trainer API
::
trainer.save_model("path/to/awesome-name-you-picked")
tokenizer.save_pretrained("path/to/awesome-name-you-picked")
Make your model work on all frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
..
TODO Sylvain: make this automatic during the upload
You probably have your favorite framework, but so will other users! That's why it's best to upload your model with both
PyTorch `and` TensorFlow checkpoints to make it easier to use (if you skip this step, users will still be able to load
your model in another framework, but it will be slower, as it will have to be converted on the fly). Don't worry, it's super easy to do (and in a future version,
it will all be automatic). You will need to install both PyTorch and TensorFlow for this step, but you don't need to
worry about the GPU, so it should be very easy. Check the
`TensorFlow installation page <https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available>`__
and/or the `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`__ to see how.
First check that your model class exists in the other framework, that is try to import the same model by either adding
or removing TF. For instance, if you trained a :class:`~transformers.DistilBertForSequenceClassification`, try to
type
::
from transformers import TFDistilBertForSequenceClassification
and if you trained a :class:`~transformers.TFDistilBertForSequenceClassification`, try to
type
::
from transformers import DistilBertForSequenceClassification
This will give back an error if your model does not exist in the other framework (something that should be pretty rare
since we're aiming for full parity between the two frameworks). In this case, skip this and go to the next step.
Now, if you trained your model in PyTorch and have to create a TensorFlow version, adapt the following code to your
model class:
::
tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
tf_model.save_pretrained("path/to/awesome-name-you-picked")
and if you trained your model in TensorFlow and have to create a PyTorch version, adapt the following code to your
model class:
::
pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
pt_model.save_pretrained("path/to/awesome-name-you-picked")
That's all there is to it!
Check the directory before uploading
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Make sure there are no garbage files in the directory you'll upload. It should only have:
- a `config.json` file, which saves the :doc:`configuration <main_classes/configuration>` of your model ;
- a `pytorch_model.bin` file, which is the PyTorch checkpoint (unless you can't have it for some reason) ;
- a `tf_model.h5` file, which is the TensorFlow checkpoint (unless you can't have it for some reason) ;
- a `special_tokens_map.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `tokenizer_config.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save;
- a `vocab.txt`, which is the vocabulary of your tokenizer, part of your :doc:`tokenizer <main_classes/tokenizer>`
save;
- maybe a `added_tokens.json`, which is part of your :doc:`tokenizer <main_classes/tokenizer>` save.
Other files can safely be deleted.
Upload your model with the CLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now go in a terminal and run the following command. It should be in the virtual enviromnent where you installed 🤗
Transformers, since that command :obj:`transformers-cli` comes from the library.
::
transformers-cli login
Then log in using the same credentials as on huggingface.co. To upload your model, just type
::
transformers-cli upload path/to/awesome-name-you-picked/
This will upload the folder containing the weights, tokenizer and configuration we prepared in the previous section.
If you want to upload a single file (a new version of your model, or the other framework checkpoint you want to add),
just type:
::
transformers-cli upload path/to/awesome-name-you-picked/that-file
or
::
transformers-cli upload path/to/awesome-name-you-picked/that-file --filename awesome-name-you-picked/new_name
if you want to change its filename.
This uploads the model to your personal account. If you want your model to be namespaced by your organization name
rather than your username, add the following flag to any command:
::
--organization organization_name
so for instance:
::
transformers-cli upload path/to/awesome-name-you-picked/ --organization organization_name
Your model will then be accessible through its identifier, which is, as we saw above,
`username/awesome-name-you-picked` or `organization/awesome-name-you-picked`.
Add a model card
^^^^^^^^^^^^^^^^
To make sure everyone knows what your model can do, what its limitations and potential bias or ethetical
considerations, please add a README.md model card to the 🤗 Transformers repo under `model_cards/`. It should be named
`README.md` and follow `this template <https://github.com/huggingface/model_card>`__.
If your model is fine-tuned from another model coming from the model hub (all 🤗 Transformers pretrained models do),
don't forget to link to its model card so that people can fully trace how your model was built.
If you have never made a pull request to the 🤗 Transformers repo, look at the
:doc:`contributing guide <contributing>` to see the steps to follow.
Using your model
^^^^^^^^^^^^^^^^
Your model now has a page on huggingface.co/models 🔥
Anyone can load it from code:
::
tokenizer = AutoTokenizer.from_pretrained("namespace/awesome-name-you-picked")
model = AutoModel.from_pretrained("namespace/awesome-name-you-picked")
Additional commands
^^^^^^^^^^^^^^^^^^^
You can list all the files you uploaded on the hub like this:
::
transformers-cli s3 ls
You can also delete unneeded files with
::
transformers-cli s3 rm awesome-name-you-picked/filename

View File

@ -0,0 +1,618 @@
Summary of the models
================================================
This is a summary of the models available in 🤗 Transformers. It assumes youre familiar with the original
`transformer model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer
<http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
models. You can check them more in detail in their respective documentation. Also checkout the
:doc:`pretrained model page </pretrained_models>` to see the checkpoints available for each type of model and all `the
community models <https://huggingface.co/models>`_.
Each one of the models in the library falls into one of the following categories:
* :ref:`autoregressive-models`
* :ref:`autoencoding-models`
* :ref:`seq-to-seq-models`
* :ref:`multimodal-models`
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the next, and not whats after. Although those
models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation.
A typical example of such models is GPT.
Autoencoding models are pretrained by corrupting the input tokens in some way and trying to reconstruct the original
sentence. They correspond to the encoder of the original transformer model in the sense that they get access to the
full inputs without any mask. Those models usually build a bidirectional representation of the whole sentence. They can
be fine-tuned and achieve great results on many tasks such as text generation, but their most natural application is
sentence classification or token classification. A typical example of such models is BERT.
Note that the only difference between autoregressive models and autoencoding models is in the way the model is
pretrained. Therefore, the same architecture can be used for both autoregressive and autoencoding models. When a given
model has been used for both pretraining, we have put it in the category corresponding to the article it was first
introduced.
Sequence-to-sequence models use both the encoder and the decoder of the original transformer, either for translation
tasks or by transforming other tasks to sequence-to-sequence problems. They can be fine-tuned to many tasks but their
most natural applications are translation, summarization and question answering. The original transformer model is an
example of such a model (only for translation), T5 is an example that can be fine-tuned on other tasks.
Multimodal models mix text inputs with other kinds (like image) and are more specific to a given task.
.. _autoregressive-models:
Autoregressive models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
that at each position, the model can only look at the tokens before in the attention heads.
Original GPT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=openai-gpt">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-openai--gpt-blueviolet">
</a>
<a href="/model_doc/gpt">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-openai--gpt-blueviolet">
</a>
`Improving Language Understanding by Generative Pre-Training <https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf>`_,
Alec Radford et al.
The first autoregressive model based on the transformer architecture, pretrained on the Book Corpus dataset.
The library provides versions of the model for language modeling and multitask language modeling/multiple choice
classification.
GPT-2
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=gpt2">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-gpt2-blueviolet">
</a>
<a href="/model_doc/gpt2">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-gpt2-blueviolet">
</a>
`Language Models are Unsupervised Multitask Learners <https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_,
Alec Radford et al.
A bigger and better version of GPT, pretrained on WebText (web pages from outgoing links in Reddit with 3 karmas or
more).
The library provides versions of the model for language modeling and multitask language modeling/multiple choice
classification.
CTRL
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=ctrl">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-ctrl-blueviolet">
</a>
<a href="/model_doc/ctrl">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-ctrl-blueviolet">
</a>
`CTRL: A Conditional Transformer Language Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`_,
Nitish Shirish Keskar et al.
Same as the GPT model but adds the idea of control codes. Text is generated from a prompt (can be empty) and one (or
several) of those control codes which are then used to influence the text generation: generate with the style of
wikipedia article, a book or a movie review.
The library provides a version of the model for language modeling only.
Transformer-XL
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=transfo-xl">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-transfo--xl-blueviolet">
</a>
<a href="/model_doc/transformerxl">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-transfo--xl-blueviolet">
</a>
`Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_,
Zihang Dai et al.
Same as a regular GPT model, but introduces a recurrence mechanism for two consecutive segments (similar to a regular
RNNs with two consecutive inputs). In this context, a segment is a number of consecutive tokens (for instance 512) that
may span across multiple documents, and segments are fed in order to the model.
Basically, the hidden states of the previous segment are concatenated to the current input to compute the attention
scores. This allows the model to pay attention to information that was in the previous segment as well as the current
one. By stacking multiple attention layers, the receptive field can be increased to multiple previous segments.
This changes the positional embeddings to positional relative embeddings (as the regular positional embeddings would
give the same results in the current input and the current hidden state at a given position) and needs to make some
adjustments in the way attention scores are computed.
The library provides a version of the model for language modeling only.
.. _reformer:
Reformer
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=reformer">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-reformer-blueviolet">
</a>
<a href="/model_doc/reformer">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-reformer-blueviolet">
</a>
`Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_,
Nikita Kitaev et al .
An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks
include:
* Use :ref:`Axial position encoding <axial-pos-encoding>` (see below for more details). Its a mechanism to avoid
having a huge positional encoding matrix (when the sequence length is very big) by factorizing it in smaller
matrices.
* Replace traditional attention by :ref:`LSH (local-sensitive hashing) attention <lsh-attention>` (see below for more
details). It's a technique to avoid compute the full product query-key in the attention layers.
* Avoid storing the intermediate results of each layer by using reversible transformer layers to obtain them during
the backward pass (subtracting the residuals from the input of the next layer gives them back) or recomputing them
for results inside a given layer (less efficient than storing them but saves memory).
* Compute the feedforward operations by chunks and not on the whole batch.
With those tricks, the model can be fed much larger sentences than traditional transformer autoregressive models.
**Note:** This model could be very well be used in an autoencoding setting, there is no checkpoint for such a
pretraining yet, though.
The library provides a version of the model for language modeling only.
XLNet
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlnet">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlnet-blueviolet">
</a>
<a href="/model_doc/xlnet">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlnet-blueviolet">
</a>
`XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_,
Zhilin Yang et al.
XLNet is not a traditional autoregressive model but uses a training strategy that builds on that. It permutes the
tokens in the sentence, then allows the model to use the last n tokens to predict the token n+1. Since this is all done
with a mask, the sentence is actually fed in the model in the right order, but instead of masking the first n tokens
for n+1, XLNet uses a mask that hides the previous tokens in some given permutation of 1,...,sequence length.
XLNet also uses the same recurrence mechanism as TransformerXL to build long-term dependencies.
The library provides a version of the model for language modeling, token classification, sentence classification,
multiple choice classification and question answering.
.. _autoencoding-models:
Autoencoding models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
look at all the tokens in the attention heads. For pretraining, inputs are a corrupted version of the sentence, usually
obtained by masking tokens, and targets are the original sentences.
BERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=bert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bert-blueviolet">
</a>
<a href="/model_doc/bert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bert-blueviolet">
</a>
`BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_,
Jacob Devlin et al.
Corrupts the inputs by using random masking, more precisely, during pretraining, a given percentage of tokens (usually
15%) are masked by
* a special mask token with probability 0.8
* a random token different from the one masked with probability 0.1
* the same token with probability 0.1
The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a
separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50%
they are not related. The model has to predict if the sentences are consecutive or not.
The library provides a version of the model for language modeling (traditional or masked), next sentence prediction,
token classification, sentence classification, multiple choice classification and question answering.
ALBERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=albert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-albert-blueviolet">
</a>
<a href="/model_doc/albert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-albert-blueviolet">
</a>
`ALBERT: A Lite BERT for Self-supervised Learning of Language Representations <https://arxiv.org/abs/1909.11942>`_,
Zhenzhong Lan et al.
Same as BERT but with a few tweaks:
* Embedding size E is different from hidden size H justified because the embeddings are context independent (one
embedding vector represents one token) whereas hidden states are context dependent (one hidden state represents a
sequence of tokens) so it's more logical to have H >> E. Als, the embedding matrix is large since it's V x E (V
being the vocab size). If E < H, it has less parameters.
* Layers are split in groups that share parameters (to save memory).
* Next sentence prediction is replaced by a sentence ordering prediction: in the inputs, we have two sentences A et B
(that are consecutive) and we either feed A followed by B or B followed by A. The model must predict if they have
been swapped or not.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
RoBERTa
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=roberta">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-roberta-blueviolet">
</a>
<a href="/model_doc/roberta">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-roberta-blueviolet">
</a>
`RoBERTa: A Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_,
Yinhan Liu et al.
Same as BERT with better pretraining tricks:
* dynamic masking: tokens are masked differently at each epoch whereas BERT does it once and for all
* no NSP (next sentence prediction) loss and instead of putting just two sentences together, put a chunk of
contiguous texts together to reach 512 tokens (so sentences in in an order than may span other several documents)
* train with larger batches
* use BPE with bytes as a subunit and not characters (because of unicode characters)
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
DistilBERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=distilbert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-distilbert-blueviolet">
</a>
<a href="/model_doc/distilbert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-distilbert-blueviolet">
</a>
`DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`_,
Victor Sanh et al.
Same as BERT but smaller. Trained by distillation of the pretrained BERT model, meaning it's been trained to predict
the same probabilities as the larger model. The actual objective is a combination of:
* finding the same probabilities as the teacher model
* predicting the masked tokens correctly (but no next-sentence objective)
* a cosine similarity between the hidden states of the student and the teacher model
The library provides a version of the model for masked language modeling, token classification, sentence classification
and question answering.
XLM
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlm">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm-blueviolet">
</a>
<a href="/model_doc/xlm">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm-blueviolet">
</a>
`Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_, Guillaume Lample and Alexis Conneau
A transformer model trained on several languages. There are three different type of training for this model and the
library provides checkpoints for all of them:
* Causal language modeling (CLM) which is the traditional autoregressive training (so this model could be in the
previous section as well). One of the languages is selected for each training sample, and the model input is a
sentence of 256 tokens that may span on several documents in one one those languages.
* Masked language modeling (MLM) which is like RoBERTa. One of the languages is selected for each training sample,
and the model input is a sentence of 256 tokens that may span on several documents in one one those languages, with
dynamic masking of the tokens.
* A combination of MLM and translation language modeling (TLM). This consists of concatenating a sentence in two
different languages, with random masking. To predict one of the masked token, the model can use both the
surrounding context in language 1 as well as the context given by language 2.
Checkpoints refer to which method was used for pretraining by having `clm`, `mlm` or `mlm-tlm` in their names. On top
of positional embeddings, the model has language embeddings. When training using MLM/CLM, this gives the model an
indication of the language used, and when training using MLM+TLM, an indication of which part of the input is in which
language.
The library provides a version of the model for language modeling, token classification, sentence classification and
question answering.
XLM-RoBERTa
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=xlm-roberta">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-xlm--roberta-blueviolet">
</a>
<a href="/model_doc/xlmroberta">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-xlm--roberta-blueviolet">
</a>
`Unsupervised Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`_, Alexis Conneau et
al.
Uses RoBERTa tricks on the XLM approach, but does not use the translation language modeling objective, only using
masked language modeling on sentences coming from one language. However, the model is trained on many more languages
(100) and doesn't use the language embeddings, so it's capable of detecting the input language by itself.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
FlauBERT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=flaubert">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-flaubert-blueviolet">
</a>
<a href="/model_doc/flaubert">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-flaubert-blueviolet">
</a>
`FlauBERT: Unsupervised Language Model Pre-training for French <https://arxiv.org/abs/1912.05372>`_, Hang Le et al.
Like RoBERTa, without the sentence ordering prediction (so just trained on the MLM objective).
The library provides a version of the model for language modeling and sentence classification.
ELECTRA
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=electra">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-electra-blueviolet">
</a>
<a href="/model_doc/electra">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-electra-blueviolet">
</a>
`ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators <https://arxiv.org/abs/2003.10555>`_,
Kevin Clark et al.
ELECTRA is a transformer model pretrained with the use of another (small) masked language model. The inputs are
corrupted by that language model, which takes an input text that is randomly masked and outputs a text in which ELECTRA
has to predict which token is an original and which one has been replaced. Like for GAN training, the small language
model is trained for a few steps (but with the original texts as objective, not to fool the ELECTRA model like in a
traditional GAN setting) then the ELECTRA model is trained for a few steps.
The library provides a version of the model for masked language modeling, token classification and sentence
classification.
.. _longformer:
Longformer
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=longformer">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-longformer-blueviolet">
</a>
<a href="/model_doc/longformer">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-longformer-blueviolet">
</a>
`Longformer: The Long-Document Transformer <https://arxiv.org/abs/2004.05150>`_, Iz Beltagy et al.
A transformer model replacing the attention matrices by sparse matrices to go faster. Often, the local context (e.g.,
what are the two tokens left and right?) is enough to take action for a given token. Some preselected input tokens are
still given global attention, but the attention matrix has way less parameters, resulting in a speed-up. See the
:ref:`local attention section <local-attention>` for more information.
It is pretrained the same way a RoBERTa otherwise.
**Note:** This model could be very well be used in an autoregressive setting, there is no checkpoint for such a
pretraining yet, though.
The library provides a version of the model for masked language modeling, token classification, sentence
classification, multiple choice classification and question answering.
.. _seq-to-seq-models:
Sequence-to-sequence models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models keep both the encoder and the decoder of the original transformer.
BART
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=bart">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-bart-blueviolet">
</a>
<a href="/model_doc/bart">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-bart-blueviolet">
</a>
`BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
fed the tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder, on the
pretraining tasks, a composition of the following transformations are applied:
* mask random tokens (like in BERT)
* delete random tokens
* mask a span of k tokens with a single mask token (a span of 0 tokens is an insertion of a mask token)
* permute sentences
* rotate the document to make it start by a specific token
The library provides a version of this model for conditional generation and sequence classification.
MarianMT
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=marian">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-marian-blueviolet">
</a>
<a href="/model_doc/marian">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-marian-blueviolet">
</a>
`Marian: Fast Neural Machine Translation in C++ <https://arxiv.org/abs/1804.00344>`_, Marcin Junczys-Dowmunt et al.
A framework for translation models, using the same models as BART
The library provides a version of this model for conditional generation.
T5
----------------------------------------------
.. raw:: html
<a href="https://huggingface.co/models?filter=t5">
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-t5-blueviolet">
</a>
<a href="/model_doc/t5">
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-t5-blueviolet">
</a>
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`_,
Colin Raffel et al.
Uses the traditional transformer model (except a slight change with the positional embeddings, which are learned at
each layer). To be able to operate on all NLP tasks, it transforms them in text-to-text problems by using certain
prefixes: “Summarize: …”, “question: …”, “translate English to German: …” and so forth.
The pretraining includes both supervised and self-supervised training. Supervised training is conducted on downstream
tasks provided by the GLUE and SuperGLUE benchmarks (changing them to text-to-text tasks as explained above).
Self-supervised training consists of corrupted pretrained, which means randomly removing 15% of the tokens and
replacing them by individual sentinel tokens (if several consecutive tokens are marked for removal, they are replaced
by one single sentinel token). The input of the encoder is the corrupted sentence, the input of the decoder the
original sentence and the target is then the dropped out tokens delimited by their sentinel tokens.
For instance, if we have the sentence “My dog is very cute .”, and we decide to remove the token dog, is and cute, the
input becomes “My <x> very <y> .” and the target is “<x> dog is <y> . <z>”
The library provides a version of this model for conditional generation.
.. _multimodal-models:
Multimodal models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There is one multimodal model in the library which has not been pretrained in the self-supervised fashion like the
others.
MMBT
----------------------------------------------
`Supervised Multimodal Bitransformers for Classifying Images and Text <https://arxiv.org/abs/1909.02950>`_, Douwe Kiela
et al.
A transformers model used in multimodal settings, combining a text and an image to make predictions. The transformer
model takes as inputs the embeddings of the tokenized text and a the final activations of a pretrained resnet on the
images (after the pooling layer) that goes through a linear layer (to go from number of features at the end of the
resnet to the hidden state dimension of the transformer).
The different inputs are concatenated, and on top of the positional embeddings, a segment embedding is added to let the
model know which part of the input vector corresponds to the text or the image.
The pretrained model only works for classification.
..
More information in this :doc:`model documentation </model_doc/mmbt>`.
TODO: write this page
More technical aspects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Full vs sparse attention
----------------------------------------------
Most transformer models use full attention in the sense that the attention matrix is square. It can be a big
computational bottleneck when you have long texts. Longformer and reformer are models that try to be more efficient and
use a sparse version of the attention matrix to speed up training.
.. _lsh-attention:
**LSH attention**
:ref:`Reformer <reformer>` uses LSH attention. In the softmax(QK^t), only the biggest elements (in the softmax
dimension) of the matrix QK^t are going to give useful contributions. So for each query q in Q, we can only consider
the keys k in K that are close to q. A hash function is used to determine if q and k are close. The attention mask is
modified to mask the current token (except at the first position) because it will give a query and key equal (so very
similar to each other). Since the hash can be a bit random, several hash functions are used in practice (determined by
a n_rounds parameter) then are averaged together.
.. _local-attention:
**Local attention**
:ref:`Longformer <longformer>` uses local attention: often, the local context (e.g., what are the two tokens left and
right?) is enough to take action for a given token. Also, by stacking attention layers that have a small window, the
last layer will have a receptive field of more than just the tokens on the window, allowing them to build a
representation of the whole sentence.
Some preselected input tokens are also given global attention: for those few tokens, the attention matrix can access
all tokens and this process is symmetric: all other tokens have access to those specific tokens (on top of the ones in
their local window). This is shown in Figure 2d of the paper, see below for a sample attention mask:
.. image:: imgs/local_attention_mask.png
:scale: 50 %
:align: center
Using those attention matrices with less parameters then allows the model to have inputs having a bigger sequence
length.
Other tricks
----------------------------------------------
.. _axial-pos-encoding:
**Axial positional encodings**
:ref:`Reformer <reformer>` uses axial positional encodings: in traditional transformer models, the positional encoding
E is a matrix of size :math:`l` by :math:`d`, :math:`l` being the sequence length and :math:`d` the dimension of the
hidden state. If you have very long texts, this matrix can be huge and take way too much space on the GPU.
To alleviate that, axial positional encodings consists in factorizing that big matrix E in two smaller matrices E1 and
E2, with dimensions :math:`l_{1} \times d_{1}` and :math:`l_{2} \times d_{2}`, such that :math:`l_{1} \times l_{2} = l`
and :math:`d_{1} + d_{2} = d` (with the product for the lengths, this ends up being way smaller). The embedding for
time step :math:`j` in E is obtained by concatenating the embeddings for timestep :math:`j \% l1` in E1 and
:math:`j // l1` in E2.

View File

@ -0,0 +1,117 @@
Multi-lingual models
================================================
Most of the models available in this library are mono-lingual models (English, Chinese and German). A few
multi-lingual models are available and have a different mechanisms than mono-lingual models.
This page details the usage of these models.
The two models that currently support multiple languages are BERT and XLM.
XLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM has a total of 10 different checkpoints, only one of which is mono-lingual. The 9 remaining model checkpoints can
be split in two categories: the checkpoints that make use of language embeddings, and those that don't
XLM & Language Embeddings
------------------------------------------------
This section concerns the following checkpoints:
- ``xlm-mlm-ende-1024`` (Masked language modeling, English-German)
- ``xlm-mlm-enfr-1024`` (Masked language modeling, English-French)
- ``xlm-mlm-enro-1024`` (Masked language modeling, English-Romanian)
- ``xlm-mlm-xnli15-1024`` (Masked language modeling, XNLI languages)
- ``xlm-mlm-tlm-xnli15-1024`` (Masked language modeling + Translation, XNLI languages)
- ``xlm-clm-enfr-1024`` (Causal language modeling, English-French)
- ``xlm-clm-ende-1024`` (Causal language modeling, English-German)
These checkpoints require language embeddings that will specify the language used at inference time. These language
embeddings are represented as a tensor that is of the same shape as the input ids passed to the model. The values in
these tensors depend on the language used and are identifiable using the ``lang2id`` and ``id2lang`` attributes
from the tokenizer.
Here is an example using the ``xlm-clm-enfr-1024`` checkpoint (Causal language modeling, English-French):
.. code-block::
>>> import torch
>>> from transformers import XLMTokenizer, XLMWithLMHeadModel
>>> tokenizer = XLMTokenizer.from_pretrained("xlm-clm-enfr-1024")
>>> model = XLMWithLMHeadModel.from_pretrained("xlm-clm-enfr-1024")
The different languages this model/tokenizer handles, as well as the ids of these languages are visible using the
``lang2id`` attribute:
.. code-block::
>>> print(tokenizer.lang2id)
{'en': 0, 'fr': 1}
These ids should be used when passing a language parameter during a model pass. Let's define our inputs:
.. code-block::
>>> input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1
We should now define the language embedding by using the previously defined language id. We want to create a tensor
filled with the appropriate language ids, of the same size as input_ids. For english, the id is 0:
.. code-block::
>>> language_id = tokenizer.lang2id['en'] # 0
>>> langs = torch.tensor([language_id] * input_ids.shape[1]) # torch.tensor([0, 0, 0, ..., 0])
>>> # We reshape it to be of size (batch_size, sequence_length)
>>> langs = langs.view(1, -1) # is now of shape [1, sequence_length] (we have a batch size of 1)
You can then feed it all as input to your model:
.. code-block::
>>> outputs = model(input_ids, langs=langs)
The example `run_generation.py <https://github.com/huggingface/transformers/blob/master/examples/text-generation/run_generation.py>`__
can generate text using the CLM checkpoints from XLM, using the language embeddings.
XLM without Language Embeddings
------------------------------------------------
This section concerns the following checkpoints:
- ``xlm-mlm-17-1280`` (Masked language modeling, 17 languages)
- ``xlm-mlm-100-1280`` (Masked language modeling, 100 languages)
These checkpoints do not require language embeddings at inference time. These models are used to have generic
sentence representations, differently from previously-mentioned XLM checkpoints.
BERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BERT has two checkpoints that can be used for multi-lingual tasks:
- ``bert-base-multilingual-uncased`` (Masked language modeling + Next sentence prediction, 102 languages)
- ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages)
These checkpoints do not require language embeddings at inference time. They should identify the language
used in the context and infer accordingly.
XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,
sequence labeling and question answering.
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
- ``xlm-roberta-base`` (Masked language modeling, 100 languages)
- ``xlm-roberta-large`` (Masked language modeling, 100 languages)

1
docs/source/notebooks.md Symbolic link
View File

@ -0,0 +1 @@
../../notebooks/README.md

View File

@ -0,0 +1,73 @@
Philosophy
==========
🤗 Transformers is an opinionated library built for:
- NLP researchers and educators seeking to use/study/extend large-scale transformers models
- hands-on practitioners who want to fine-tune those models and/or serve them in production
- engineers who just want to download a pretrained model and use it to solve a given NLP task.
The library was designed with two strong goals in mind:
- Be as easy and fast to use as possible:
- We strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions,
just three standard classes required to use each model: :doc:`configuration <main_classes/configuration>`,
:doc:`models <main_classes/model>` and :doc:`tokenizer <main_classes/tokenizer>`.
- All of these classes can be initialized in a simple and unified way from pretrained instances by using a common
:obj:`from_pretrained()` instantiation method which will take care of downloading (if needed), caching and
loading the related class instance and associated data (configurations' hyper-parameters, tokenizers' vocabulary,
and models' weights) from a pretrained checkpoint provided on
`Hugging Face Hub <https://huggingface.co/models>`__ or your own saved checkpoint.
- On top of those three base classes, the library provides two APIs: :func:`~transformers.pipeline` for quickly
using a model (plus its associated tokenizer and configuration) on a given task and
:func:`~transformers.Trainer`/:func:`~transformers.TFTrainer` to quickly train or fine-tune a given model.
- As a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to
extend/build-upon the library, just use regular Python/PyTorch/TensorFlow/Keras modules and inherit from the base
classes of the library to reuse functionalities like model loading/saving.
- Provide state-of-the-art models with performances as close as possible to the original models:
- We provide at least one example for each architecture which reproduces a result provided by the official authors
of said architecture.
- The code is usually as close to the original code base as possible which means some PyTorch code may be not as
*pytorchic* as it could be as a result of being converted TensorFlow code and vice versa.
A few other goals:
- Expose the models' internals as consistently as possible:
- We give access, using a single API, to the full hidden-states and attention weights.
- Tokenizer and base model's API are standardized to easily switch between models.
- Incorporate a subjective selection of promising tools for fine-tuning/investigating these models:
- A simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning.
- Simple ways to mask and prune transformer heads.
- Switch easily between PyTorch and TensorFlow 2.0, allowing training using one framwork and inference using another.
Main concepts
~~~~~~~~~~~~~
The library is build around three types of classes for each model:
- **Model classes** such as :class:`~transformers.BertModel`, which are 30+ PyTorch models
(`torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__) or Keras models
(`tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__) that work with the pretrained
weights provided in the library.
- **Configuration classes** such as :class:`~transformers.BertConfig`, which store all the parameters required to build
a model. You don't always need to instantiate these yourself. In particular, if you are using a pretrained model
without any modification, creating the model will automatically take care of instantiating the configuration (which
is part of the model).
- **Tokenizer classes** such as :class:`~transformers.BertTokenizer`, which store the vocabulary for each model and
provide methods for encoding/decoding strings in a list of token embeddings indices to be fed to a model.
All these classes can be instantiated from pretrained instances and saved locally using two methods:
- :obj:`from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either
provided by the library itself (the suported models are provided in the list :doc:`here <pretrained_models>`
or stored locally (or on a server) by the user,
- :obj:`save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using
:obj:`from_pretrained()`.

View File

@ -0,0 +1,373 @@
Preprocessing data
==================
In this tutorial, we'll explore how to preprocess your data using 🤗 Transformers. The main tool for this is what we
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
As we saw in the :doc:`quicktour </quicktour>`, the tokenizer will first split a given text in words (or part of words,
punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able to
build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect to
work properly.
.. note::
If you plan on using a pretrained model, it's important to use the associated pretrained tokenizer: it will split
the text you give it in tokens the same way for the pretraining corpus, and it will use the same correspondence
token to index (that we usually call a `vocab`) as during pretraining.
To automatically download the vocab used during pretraining or fine-tuning a given model, you can use the
:func:`~transformers.AutoTokenizer.from_pretrained` method:
::
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
Base use
~~~~~~~~
A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing
is its ``__call__``: you just need to feed your sentence to your tokenizer object.
::
encoded_input = tokenizer("Hello, I'm a single sentence!")
print(encoded_input)
This will return a dictionary string to list of ints like this one:
::
{'input_ids': [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
The `input_ids <glossary.html#input-ids>`__ are the indices corresponding to each token in our sentence. We will see
below what the `attention_mask <glossary.html#attention-mask>`__ is used for and in
:ref:`the next section <sentence-pairs>` the goal of `token_type_ids <glossary.html#token-type-ids>`__.
The tokenizer can decode a list of token ids in a proper sentence:
::
tokenizer.decode(encoded_input["input_ids"])
which should return
::
"[CLS] Hello, I'm a single sentence! [SEP]"
As you can see, the tokenizer automatically added some special tokens that the model expect. Not all model need special
tokens; for instance, if we had used` gtp2-medium` instead of `bert-base-cased` to create our tokenizer, we would have
seen the same sentence as the original one here. You can disable this behavior (which is only advised if you have added
those special tokens yourself) by passing ``add_special_tokens=False``.
If you have several sentences you want to process, you can do this efficiently by sending them as a list to the
tokenizer:
::
batch_sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
encoded_inputs = tokenizer(batch_sentences)
print(encoded_inputs)
We get back a dictionary once again, this time with values being list of list of ints:
::
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[101, 1262, 1330, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1]]}
If the purpose of sending several sentences at a time to the tokenizer is to build a batch to feed the model, you will
probably want:
- To pad each sentence to the maximum length there is in your batch.
- To truncate each sentence to the maximum length the model can accept (if applicable).
- To return tensors.
You can do all of this by using the following options when feeding your list of sentences to the tokenizer:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt")
print(batch)
## TENSORFLOW CODE
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
print(batch)
which should now return a dictionary string to tensor like this:
::
{'input_ids': tensor([[ 101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
[ 101, 1262, 1330, 5650, 102, 0, 0, 0, 0],
[ 101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 0]]),
'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0]]),
'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 0]])}
We can now see what the `attention_mask <glossary.html#attention-mask>`__ is all about: it points out which tokens the
model should pay attention to and which ones it should not (because they represent padding in this case).
Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer to throw those kinds of warnings.
.. _sentence-pairs:
Preprocessing pairs of sentences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes you need to feed pair of sentences to your model. For instance, if you want to classify if two sentences in a
pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input is
then represented like this:
::
[CLS] Sequence A [SEP] Sequence B [SEP]
You can encode a pair of sentences in the format expected by your model by supplying the two sentences as two arguments
(not a list since a list of two sentences will be interpreted as a batch of two single sentences, as we saw before).
::
encoded_input = tokenizer("How old are you?", "I'm 6 years old")
print(encoded_input)
This will once again return a dict string to list of ints:
::
{'input_ids': [101, 1731, 1385, 1132, 1128, 136, 102, 146, 112, 182, 127, 1201, 1385, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
This shows us what the `token_type_ids <glossary.html#token-type-ids>`__ are for: they indicate to the model which part
of the inputs correspond to the first sentence and which part corresponds to the second sentence. Note that
`token_type_ids` are not required or handled by all models. By default, a tokenizer will only return the inputs that
its associated model expects. You can force the return (or the non-return) of any of those special arguments by
using ``return_input_ids`` or ``return_token_type_ids``.
If we decode the token ids we obtained, we will see that the special tokens have been properly added.
::
tokenizer.decode(encoded_input["input_ids"])
will return:
::
"[CLS] How old are you? [SEP] I'm 6 years old [SEP]"
If you have a list of pairs of sequences you want to process, you should feed them as two lists to your tokenizer: the
list of first sentences and the list of second sentences:
::
batch_sentences = ["Hello I'm a single sentence",
"And another sentence",
"And the very very last one"]
batch_of_second_sentences = ["I'm a sentence that goes with the first sentence",
"And I should be encoded with the second sentence",
"And I go with the very last one"]
encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences)
print(encoded_inputs)
will return a dict with the values being list of lists of ints:
::
{'input_ids': [[101, 8667, 146, 112, 182, 170, 1423, 5650, 102, 146, 112, 182, 170, 5650, 1115, 2947, 1114, 1103, 1148, 5650, 102],
[101, 1262, 1330, 5650, 102, 1262, 146, 1431, 1129, 12544, 1114, 1103, 1248, 5650, 102],
[101, 1262, 1103, 1304, 1304, 1314, 1141, 102, 1262, 146, 1301, 1114, 1103, 1304, 1314, 1141, 102]],
'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}
To double-check what is fed to the model, we can decode each list in `input_ids` one by one:
::
for ids in encoded_inputs["input_ids"]:
print(tokenizer.decode(ids))
which will return:
::
[CLS] Hello I'm a single sentence [SEP] I'm a sentence that goes with the first sentence [SEP]
[CLS] And another sentence [SEP] And I should be encoded with the second sentence [SEP]
[CLS] And the very very last one [SEP] And I go with the very last one [SEP]
Once again, you can automatically pad your inputs to the maximum sentence length in the batch, truncate to the maximum
length the model can accept and return tensors directly with the following:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences, batch_of_second_sentences, padding=True, truncation=True, return_tensors="tf")
Everything you always wanted to know about padding and truncation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The
three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`.
- :obj:`padding` controls the padding. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'longest'` to pad to the longest sequence in the batch (doing no padding if you only provide
a single sequence).
- :obj:`'max_length'` to pad to a length specified by the :obj:`max_length` argument or the maximum length accepted
by the model if no :obj:`max_length` is provided (``max_length=None``). If you only provide a single sequence,
padding will still be applied to it.
- :obj:`False` or :obj:`'do_not_pad'` to not pad the sequences. As we have seen before, this is the default
behavior.
- :obj:`truncation` controls the truncation. It can be a boolean or a string which should be:
- :obj:`True` or :obj:`'only_first'` truncate to a maximum length specified by the :obj:`max_length` argument or
the maximum length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will
only truncate the first sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'only_second'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will only truncate
the second sentence of a pair if a pair of sequence (or a batch of pairs of sequences) is provided.
- :obj:`'longest_first'` truncate to a maximum length specified by the :obj:`max_length` argument or the maximum
length accepted by the model if no :obj:`max_length` is provided (``max_length=None``). This will truncate token
by token, removing a token from the longest sequence in the pair until the proper length is reached.
- :obj:`False` or :obj:`'do_not_truncate'` to not truncate the sequences. As we have seen before, this is the
default behavior.
- :obj:`max_length` to control the length of the padding/truncation. It can be an integer or :obj:`None`, in which case
it will default to the maximum length the model can accept. If the model has no specific maximum input length,
truncation/padding to :obj:`max_length` is deactivated.
Here is a table summarizing the recommend way to setup padding and truncation. If you use pair of inputs sequence in
any of the following examples, you can replace :obj:`truncation=True` by a :obj:`STRATEGY` selected in
:obj:`['only_first', 'only_second', 'longest_first']`, i.e. :obj:`truncation='only_second'` or
:obj:`truncation= 'longest_first'` to control how both sequence in the pair are truncated as detailed before.
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| Truncation | Padding | Instruction |
+======================================+===================================+=============================================================================================+
| no truncation | no padding | :obj:`tokenizer(batch_sentences)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='longest')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length')` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to max model input length | no padding | :obj:`tokenizer(batch_sentences, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | Not possible |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
| truncation to specific length | no padding | :obj:`tokenizer(batch_sentences, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max sequence in batch | :obj:`tokenizer(batch_sentences, padding=True, truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding=True, truncation=STRATEGY, max_length=42)` |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to max model input length | Not possible |
| +-----------------------------------+---------------------------------------------------------------------------------------------+
| | padding to specific length | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=True, max_length=42)` or |
| | | :obj:`tokenizer(batch_sentences, padding='max_length', truncation=STRATEGY, max_length=42)` |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------------------------------------+
Pre-tokenized inputs
~~~~~~~~~~~~~~~~~~~~
The tokenizer also accept pre-tokenized inputs. This is particularly useful when you want to compute labels and extract
predictions in `named entity recognition (NER) <https://en.wikipedia.org/wiki/Named-entity_recognition>`__ or
`part-of-speech tagging (POS tagging) <https://en.wikipedia.org/wiki/Part-of-speech_tagging>`__.
If you want to use pre-tokenized inputs, just set :obj:`is_pretokenized=True` when passing your inputs to the
tokenizer. For instance:
::
encoded_input = tokenizer(["Hello", "I'm", "a", "single", "sentence"], is_pretokenized=True)
print(encoded_input)
will return:
::
{'input_ids': [101, 8667, 146, 112, 182, 170, 1423, 5650, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}
Note that the tokenizer still adds the ids of special tokens (if applicable) unless you pass
``add_special_tokens=False``.
This works exactly as before for batch of sentences or batch of pairs of sentences. You can encode a batch of sentences
like this:
::
batch_sentences = [["Hello", "I'm", "a", "single", "sentence"],
["And", "another", "sentence"],
["And", "the", "very", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, is_pretokenized=True)
or a batch of pair sentences like this:
::
batch_of_second_sentences = [["I'm", "a", "sentence", "that", "goes", "with", "the", "first", "sentence"],
["And", "I", "should", "be", "encoded", "with", "the", "second", "sentence"],
["And", "I", "go", "with", "the", "very", "last", "one"]]
encoded_inputs = tokenizer(batch_sentences, batch_of_second_sentences, is_pretokenized=True)
And you can add padding, truncation as well as directly return tensors like before:
::
## PYTORCH CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_pretokenized=True,
padding=True,
truncation=True,
return_tensors="pt")
## TENSORFLOW CODE
batch = tokenizer(batch_sentences,
batch_of_second_sentences,
is_pretokenized=True,
padding=True,
truncation=True,
return_tensors="tf")

View File

@ -0,0 +1,359 @@
Pretrained models
================================================
Here is the full list of the currently provided pretrained models together with a short presentation of each model.
For a list that includes community-uploaded models, refer to `https://huggingface.co/models <https://huggingface.co/models>`__.
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Architecture | Shortcut name | Details of the model |
+===================+============================================================+=======================================================================================================================================+
| BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
| | | |
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Chinese Simplified and Traditional text. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by Deepset.ai |
| | | |
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on lower-cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | Trained on cased English text using Whole-Word-Masking |
| | | |
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
| | | |
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased German text by DBMDZ |
| | | |
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece. |
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized with MeCab and WordPiece. |
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
| | | |
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on uncased Finnish text. |
| | | |
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | Trained on cased Dutch text. |
| | | |
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | OpenAI GPT English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| GPT-2 | ``gpt2`` | | 12-layer, 768-hidden, 12-heads, 117M parameters. |
| | | | OpenAI GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-medium`` | | 24-layer, 1024-hidden, 16-heads, 345M parameters. |
| | | | OpenAI's Medium-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters. |
| | | | OpenAI's Large-sized GPT-2 English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``gpt2-xl`` | | 48-layer, 1600-hidden, 25-heads, 1558M parameters. |
| | | | OpenAI's XL-sized GPT-2 English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Transformer-XL | ``transfo-xl-wt103`` | | 18-layer, 1024-hidden, 16-heads, 257M parameters. |
| | | | English model trained on wikitext-103 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLNet | ``xlnet-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
| | | | XLNet English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlnet-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
| | | | XLNet Large English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM | ``xlm-mlm-en-2048`` | | 12-layer, 2048-hidden, 16-heads |
| | | | XLM English model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-enro-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-Romanian Multi-language model |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-tlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
| | | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-French model trained with CLM (Causal Language Modeling) on the concatenation of English and French wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | RoBERTa using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | RoBERTa using the BERT-large architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
| | | |
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
| | | |
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
| | | | Salesforce's Large-sized CTRL English model |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
| | | | CamemBERT using the BERT-base architecture |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
| | | | ALBERT base model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
| | | | ALBERT large model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
| | | |
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~125M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``xlm-roberta-large`` | | ~355M parameters with 24-layers, 1027-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
| | | | FlauBERT small architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
| | | | FlauBERT base architecture with uncased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
| | | | FlauBERT base architecture with cased vocabulary |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
| | | | FlauBERT large architecture |
| | | |
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
| | | |
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
| | | | bart-large base architecture with a classification head, finetuned on MNLI |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
| | | | bart-large base architecture finetuned on cnn summarization task |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``facebook/mbart-large-en-ro`` | | 12-layer, 1024-hidden, 16-heads, 880M parameters |
| | | | bart-large architecture pretrained on cc25 multilingual data , finetuned on WMT english romanian translation. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-medium`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``DialoGPT-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters |
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Reformer | ``reformer-enwik8`` | | 12-layer, 1024-hidden, 8-heads, 149M parameters |
| | | | Trained on English Wikipedia data - enwik8. |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``reformer-crime-and-punishment`` | | 6-layer, 256-hidden, 2-heads, 3M parameters |
| | | | Trained on English text: Crime and Punishment novel by Fyodor Dostoyevsky. |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Longformer | ``allenai/longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+

393
docs/source/quicktour.rst Normal file
View File

@ -0,0 +1,393 @@
Quick tour
==========
Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for
Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
such as completing a prompt with new text or translating in another language.
First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we
will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data.
.. note::
All code examples presented in the documentation have a switch on the top left for Pytorch versus TensorFlow. If
not, the code is expected to work for both backends without any change needed.
Getting started on a task with a pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers
provides the following tasks out of the box:
- Sentiment analysis: is a text positive or negative?
- Text generation (in English): provide a prompt and the model will generate what follows.
- Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place,
etc.)
- Question answering: provide the model with some context and a question, extract the answer from the context.
- Filling masked text: given a text with masked words (e.g., replaced by ``[MASK]``), fill the blanks.
- Summarization: generate a summary of a long text.
- Translation: translate a text in another language.
- Feature extraction: return a tensor representation of the text.
Let's see how this work for sentiment analysis (the other tasks are all covered in the
:doc:`task summary </task_summary>`):
.. code-block::
>>> from transformers import pipeline
>>> classifier = pipeline('sentiment-analysis')
When typing this command for the first time, a pretrained model and its tokenizer are downloaded and cached. We will
look at both later on, but as an introduction the tokenizer's job is to preprocess the text for the model, which is
then responsible for making predictions. The pipeline groups all of that together, and post-process the predictions to
make them readable. For instance:
.. code-block::
>>> classifier('We are very happy to show you the 🤗 Transformers library.')
[{'label': 'POSITIVE', 'score': 0.9997795224189758}]
That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model as a
`batch`, returning a list of dictionaries like this one:
.. code-block::
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.",
... "We hope you don't hate it."])
>>> for result in results:
... print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is
fairly neutral.
By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can
look at its `model page <https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english>`__ to get more
information about it. It uses the :doc:`DistilBERT architecture </model_doc/distilbert>` and has been fine-tuned on a
dataset called SST-2 for the sentiment analysis task.
Let's say we want to use another model; for instance, one that has been trained on French data. We can search through
the `model hub <https://huggingface.co/models>`__ that gathers models pretrained on a lot of data by research labs, but
also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags
"French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's
see how we can use it.
You can directly pass the name of the model to use to :func:`~transformers.pipeline`:
.. code-block::
>>> classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")
This classifier can now deal with texts in English, French, but also Dutch, German, Italian and Spanish! You can also
replace that name by a local folder where you have saved a pretrained model (see below). You can also pass a model
object and its associated tokenizer.
We will need two classes for this. The first is :class:`~transformers.AutoTokenizer`, which we will use to download the
tokenizer associated to the model we picked and instantiate it. The second is
:class:`~transformers.AutoModelForSequenceClassification` (or
:class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow), which we will use to download
the model itself. Note that if we were using the library on an other task, the class of the model would change. The
:doc:`task summary </task_summary>` tutorial summarizes which class is used for which task.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> ## TENSORFLOW CODE
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
Now, to download the models and tokenizer we found previously, we just have to use the
:func:`~transformers.AutoModelForSequenceClassification.from_pretrained` method (feel free to replace ``model_name`` by
any other model from the model hub):
.. code-block::
>>> ## PYTORCH CODE
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> pipe = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
>>> ## TENSORFLOW CODE
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> # This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow.
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
If you don't find a model that has been pretrained on some data similar to yours, you will need to fine-tune a
pretrained model on your data. We provide :doc:`example scripts </examples>` to do so. Once you're done, don't forget
to share your fine-tuned model on the hub with the community, using :doc:`this tutorial </model_sharing>`.
.. _pretrained-model:
Under the hood: pretrained models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created
using the :obj:`from_pretrained` method:
::
>>> ## PYTORCH CODE
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
>>> ## TENSORFLOW CODE
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
Using the tokenizer
^^^^^^^^^^^^^^^^^^^
We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
that process, which is why we need to instantiate the tokenizer using the name of the model, to make sure we use the
same rules as when the model was pretrained.
The second step is to convert those `tokens` into numbers, to be able to build a tensor out of them and feed them to
the model. To do this, the tokenizer has a `vocab`, which is the part we download when we instantiate it with the
:obj:`from_pretrained` method, since we need to use the same `vocab` as when the model was pretrained.
To apply these steps on a given text, we can just feed it to our tokenizer:
.. code-block::
>>> inputs = tokenizer("We are very happy to show you the 🤗 Transformers library.")
This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary.html#input-ids>`__,
as mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
`attention mask <glossary.html#attention-mask>`__ that the model will use to have a better understanding of the sequence:
.. code-block::
>>> print(inputs)
{'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
You can pass a list of sentences directly to your tokenizer. If your goal is to send them through your model as a
batch, you probably want to pad them all to the same length, truncate them to the maximum length the model can accept
and get tensors back. You can specify all of that to the tokenizer:
.. code-block::
>>> ## PYTORCH CODE
>>> pt_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... return_tensors="pt"
... )
>>> ## TENSORFLOW CODE
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... return_tensors="tf"
... )
The padding is automatically applied on the side the model expect it (in this case, on the right), with the
padding token the model was pretrained with. The attention mask is also adapted to take the padding into account:
.. code-block::
>>> ## PYTORCH CODE
>>> for key, value in pt_batch.items():
... print(f"{key}: {value.numpy().tolist()}")
input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]
>>> ## TENSORFLOW CODE
>>> for key, value in tf_batch.items():
... print(f"{key}: {value.numpy().tolist()}")
input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]
You can learn more about tokenizers :doc:`here <preprocessing>`.
Using the model
^^^^^^^^^^^^^^^
Once your input has been preprocessed by the tokenizer, you can directly send it to the model. As we mentioned, it will
contain all the relevant information the model needs. If you're using a TensorFlow model, you can directly pass the
dictionary keys to tensor, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
.. code-block::
>>> ## PYTORCH CODE
>>> pt_outputs = pt_model(**pt_batch)
>>> ## TENSORFLOW CODE
>>> tf_outputs = tf_model(tf_batch)
In 🤗 Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the
final activations of the model.
.. code-block::
>>> ## PYTORCH CODE
>>> print(pt_outputs)
(tensor([[-4.0833, 4.3364],
[ 0.0818, -0.0418]], grad_fn=<AddmmBackward>),)
>>> ## TENSORFLOW CODE
>>> print(tf_outputs)
(<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.0832963 , 4.3364134 ],
[ 0.08181238, -0.04178794]], dtype=float32)>,)
.. note::
All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
activation function (like SoftMax) since this final activation function is often fused with the loss.
Let's apply the SoftMax activation to get predictions.
.. code-block::
>>> ## PYTORCH CODE
>>> import torch.nn.functional as F
>>> pt_predictions = F.softmax(pt_outputs[0], dim=-1)
>>> ## TENSORFLOW CODE
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs[0], axis=-1)
We can see we get the numbers from before:
.. code-block::
>>> ## TENSORFLOW CODE
>>> print(tf_predictions)
tf.Tensor(
[[2.2042994e-04 9.9977952e-01]
[5.3086078e-01 4.6913919e-01]], shape=(2, 2), dtype=float32)
>>> ## PYTORCH CODE
>>> print(pt_predictions)
tensor([[2.2043e-04, 9.9978e-01],
[5.3086e-01, 4.6914e-01]], grad_fn=<SoftmaxBackward>)
If you have labels, you can provide them to the model, it will return a tuple with the loss and the final activations.
.. code-block::
>>> ## PYTORCH CODE
>>> import torch
>>> pt_outputs = pt_model(**pt_batch, labels = torch.tensor([1, 0]))
>>> ## TENSORFLOW CODE
>>> import tensorflow as tf
>>> tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0]))
Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or
`tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual
training loop. 🤗 Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if
you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed
precision, etc.). See the :doc:`training tutorial <training>` for more details.
Once your model is fine-tuned, you can save it with its tokenizer the following way:
::
tokenizer.save_pretrained(save_directory)
model.save_pretrained(save_directory)
You can then load this model back using the :func:`~transformers.AutoModel.from_pretrained` method by passing the
directory name instead of the model name. One cool feature of 🤗 Transformers is that you can easily switch between
PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow. If you are
loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TFAutoModel.from_pretrained` like this:
::
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = TFAutoModel.from_pretrained(save_directory, from_pt=True)
and if you are loading a saved TensorFlow model in a PyTorch model, you should use the following code:
::
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = AutoModel.from_pretrained(save_directory, from_tf=True)
Lastly, you can also ask the model to return all hidden states and all attention weights if you need them:
::
>>> ## PYTORCH CODE
>>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True)
>>> all_hidden_states, all_attentions = pt_outputs[-2:]
>>> ## TENSORFLOW CODE
>>> tf_outputs = tf_model(tf_batch, output_hidden_states=True, output_attentions=True)
>>> all_hidden_states, all_attentions = tf_outputs[-2:]
Accessing the code
^^^^^^^^^^^^^^^^^^
The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that will automatically work with any
pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
code is easy to access and tweak if you need to.
In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
using the :doc:`DistilBERT </model_doc/distilbert>` architecture. The model automatically created is then a
:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
without the auto magic:
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
>>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = DistilBertForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
>>> ## TENSORFLOW CODE
>>> from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
>>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
>>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
Customizing the model
^^^^^^^^^^^^^^^^^^^^^
If you want to change how the model itself is built, you can define your custom configuration class. Each architecture
comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which
allows you to specify any of the hidden dimension, dropout rate etc. If you do core modifications, like changing the
hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then
instantiate the model directly from this configuration.
Here we use the predefined vocabulary of DistilBERT (hence load the tokenizer with the
:func:`~transformers.DistilBertTokenizer.from_pretrained` method) and initialize the model from scratch (hence
instantiate the model from the configuration instead of using the
:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method).
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
>>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
>>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
>>> model = DistilBertForSequenceClassification(config)
>>> ## TENSORFLOW CODE
>>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
>>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
>>> tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
>>> model = TFDistilBertForSequenceClassification(config)
For something that only changes the head of the model (for instance, the number of labels), you can still use a
pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body.
We could create a configuration with all the default values and just change the number of labels, but more easily, you
can directly pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the
default configuration with it:
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
>>> model_name = "distilbert-base-uncased"
>>> model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
>>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
>>> ## TENSORFLOW CODE
>>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
>>> model_name = "distilbert-base-uncased"
>>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
>>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)

View File

@ -0,0 +1,845 @@
Summary of the tasks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This page shows the most frequent use-cases when using the library. The models available allow for many different
configurations and a great versatility in use-cases. The most simple ones are presented here, showcasing usage
for tasks such as question answering, sequence classification, named entity recognition and others.
These examples leverage auto-models, which are classes that will instantiate a model according to a given checkpoint,
automatically selecting the correct model architecture. Please check the :class:`~transformers.AutoModel` documentation
for more information.
Feel free to modify the code to be more specific and adapt it to your specific use-case.
In order for a model to perform well on a task, it must be loaded from a checkpoint corresponding to that task. These
checkpoints are usually pre-trained on a large corpus of data and fine-tuned on a specific task. This means the
following:
- Not all models were fine-tuned on all tasks. If you want to fine-tune a model on a specific task, you can leverage
one of the `run_$TASK.py` script in the
`examples <https://github.com/huggingface/transformers/tree/master/examples>`_ directory.
- Fine-tuned models were fine-tuned on a specific dataset. This dataset may or may not overlap with your use-case
and domain. As mentioned previously, you may leverage the
`examples <https://github.com/huggingface/transformers/tree/master/examples>`_ scripts to fine-tune your model, or you
may create your own training script.
In order to do an inference on a task, several mechanisms are made available by the library:
- Pipelines: very easy-to-use abstractions, which require as little as two lines of code.
- Using a model directly with a tokenizer (PyTorch/TensorFlow): the full inference using the model. Less abstraction,
but much more powerful.
Both approaches are showcased here.
.. note::
All tasks presented here leverage pre-trained checkpoints that were fine-tuned on specific tasks. Loading a
checkpoint that was not fine-tuned on a specific task would load only the base transformer layers and not the
additional head that is used for the task, initializing the weights of that head randomly.
This would produce random output.
Sequence Classification
--------------------------
Sequence classification is the task of classifying sequences according to a given number of classes. An example
of sequence classification is the GLUE dataset, which is entirely based on that task. If you would like to fine-tune
a model on a GLUE sequence classification task, you may leverage the
`run_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_glue.py>`_ or
`run_tf_glue.py <https://github.com/huggingface/transformers/tree/master/examples/text-classification/run_tf_glue.py>`_ scripts.
Here is an example using the pipelines do to sentiment analysis: identifying if a sequence is positive or negative.
It leverages a fine-tuned model on sst2, which is a GLUE task.
This returns a label ("POSITIVE" or "NEGATIVE") alongside a score, as follows:
.. code-block::
>>> from transformers import pipeline
>>> nlp = pipeline("sentiment-analysis")
>>> result = nlp("I hate you")[0]
>>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: NEGATIVE, with score: 0.9991
>>> result = nlp("I love you")[0]
>>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
Here is an example of doing a sequence classification using a model to determine if two sequences are paraphrases
of each other. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
- Build a sequence from the two sentences, with the correct model-specific separators token type ids
and attention masks (:func:`~transformers.PreTrainedTokenizer.encode` and
:func:`~transformers.PreTrainedTokenizer.__call__` take care of this)
- Pass this sequence through the model so that it is classified in one of the two available classes: 0
(not a paraphrase) and 1 (is a paraphrase)
- Compute the softmax of the result to get probabilities over the classes
- Print the results
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> classes = ["not paraphrase", "is paraphrase"]
>>> sequence_0 = "The company HuggingFace is based in New York City"
>>> sequence_1 = "Apples are especially bad for your health"
>>> sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
>>> paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
>>> not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")
>>> paraphrase_classification_logits = model(**paraphrase)[0]
>>> not_paraphrase_classification_logits = model(**not_paraphrase)[0]
>>> paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
>>> not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]
>>> # Should be paraphrase
>>> for i in range(len(classes)):
... print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")
not paraphrase: 10%
is paraphrase: 90%
>>> # Should not be paraphrase
>>> for i in range(len(classes)):
... print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")
not paraphrase: 94%
is paraphrase: 6%
>>> ## TENSORFLOW CODE
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")
>>> classes = ["not paraphrase", "is paraphrase"]
>>> sequence_0 = "The company HuggingFace is based in New York City"
>>> sequence_1 = "Apples are especially bad for your health"
>>> sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
>>> paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="tf")
>>> not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="tf")
>>> paraphrase_classification_logits = model(paraphrase)[0]
>>> not_paraphrase_classification_logits = model(not_paraphrase)[0]
>>> paraphrase_results = tf.nn.softmax(paraphrase_classification_logits, axis=1).numpy()[0]
>>> not_paraphrase_results = tf.nn.softmax(not_paraphrase_classification_logits, axis=1).numpy()[0]
>>> # Should be paraphrase
>>> for i in range(len(classes)):
... print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")
not paraphrase: 10%
is paraphrase: 90%
>>> # Should not be paraphrase
>>> for i in range(len(classes)):
... print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")
not paraphrase: 94%
is paraphrase: 6%
Extractive Question Answering
----------------------------------------------------
Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
a model on a SQuAD task, you may leverage the `run_squad.py`.
Here is an example using the pipelines do to question answering: extracting an answer from a text given a question.
It leverages a fine-tuned model on SQuAD.
.. code-block::
>>> from transformers import pipeline
>>> nlp = pipeline("question-answering")
>>> context = r"""
... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a
... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune
... a model on a SQuAD task, you may leverage the examples/question-answering/run_squad.py script.
... """
This returns an answer extracted from the text, a confidence score, alongside "start" and "end" values which
are the positions of the extracted answer in the text.
.. code-block::
>>> result = nlp(question="What is extractive question answering?", context=context)
>>> print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
Answer: 'the task of extracting an answer from a text given a question.', score: 0.6226, start: 34, end: 96
>>> result = nlp(question="What is a good example of a question answering dataset?", context=context)
>>> print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")
Answer: 'SQuAD dataset,', score: 0.5053, start: 147, end: 161
Here is an example of question answering using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it
with the weights stored in the checkpoint.
- Define a text and a few questions.
- Iterate over the questions and build a sequence from the text and the current question, with the correct
model-specific separators token type ids and attention masks
- Pass this sequence through the model. This outputs a range of scores across the entire sequence tokens (question and
text), for both the start and end positions.
- Compute the softmax of the result to get probabilities over the tokens
- Fetch the tokens from the identified start and stop values, convert those tokens to a string.
- Print the results
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoTokenizer, AutoModelForQuestionAnswering
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
... architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
... Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
... TensorFlow 2.0 and PyTorch.
... """
>>> questions = [
... "How many pretrained models are available in 🤗 Transformers?",
... "What does 🤗 Transformers provide?",
... "🤗 Transformers provides interoperability between which frameworks?",
... ]
>>> for question in questions:
... inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="pt")
... input_ids = inputs["input_ids"].tolist()[0]
...
... text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
... answer_start_scores, answer_end_scores = model(**inputs)
...
... answer_start = torch.argmax(
... answer_start_scores
... ) # Get the most likely beginning of answer with the argmax of the score
... answer_end = torch.argmax(answer_end_scores) + 1 # Get the most likely end of answer with the argmax of the score
...
... answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
...
... print(f"Question: {question}")
... print(f"Answer: {answer}")
Question: How many pretrained models are available in 🤗 Transformers?
Answer: over 32 +
Question: What does 🤗 Transformers provide?
Answer: general - purpose architectures
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: tensorflow 2 . 0 and pytorch
>>> ## TENSORFLOW CODE
>>> from transformers import AutoTokenizer, TFAutoModelForQuestionAnswering
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
>>> text = r"""
... 🤗 Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose
... architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural
... Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
... TensorFlow 2.0 and PyTorch.
... """
>>> questions = [
... "How many pretrained models are available in 🤗 Transformers?",
... "What does 🤗 Transformers provide?",
... "🤗 Transformers provides interoperability between which frameworks?",
... ]
>>> for question in questions:
... inputs = tokenizer(question, text, add_special_tokens=True, return_tensors="tf")
... input_ids = inputs["input_ids"].numpy()[0]
...
... text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
... answer_start_scores, answer_end_scores = model(inputs)
...
... answer_start = tf.argmax(
... answer_start_scores, axis=1
... ).numpy()[0] # Get the most likely beginning of answer with the argmax of the score
... answer_end = (
... tf.argmax(answer_end_scores, axis=1) + 1
... ).numpy()[0] # Get the most likely end of answer with the argmax of the score
... answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))
...
... print(f"Question: {question}")
... print(f"Answer: {answer}")
Question: How many pretrained models are available in 🤗 Transformers?
Answer: over 32 +
Question: What does 🤗 Transformers provide?
Answer: general - purpose architectures
Question: 🤗 Transformers provides interoperability between which frameworks?
Answer: tensorflow 2 . 0 and pytorch
Language Modeling
----------------------------------------------------
Language modeling is the task of fitting a model to a corpus, which can be domain specific. All popular transformer
based models are trained using a variant of language modeling, e.g. BERT with masked language modeling, GPT-2 with
causal language modeling.
Language modeling can be useful outside of pre-training as well, for example to shift the model distribution to be
domain-specific: using a language model trained over a very large corpus, and then fine-tuning it to a news dataset
or on scientific papers e.g. `LysandreJik/arxiv-nlp <https://huggingface.co/lysandre/arxiv-nlp>`__.
Masked Language Modeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to
fill that mask with an appropriate token. This allows the model to attend to both the right context (tokens on the
right of the mask) and the left context (tokens on the left of the mask). Such a training creates a strong basis
for downstream tasks requiring bi-directional context such as SQuAD (question answering,
see `Lewis, Lui, Goyal et al. <https://arxiv.org/abs/1910.13461>`__, part 4.2).
Here is an example of using pipelines to replace a mask from a sequence:
.. code-block::
>>> from transformers import pipeline
>>> nlp = pipeline("fill-mask")
This outputs the sequences with the mask filled, the confidence score as well as the token id in the tokenizer
vocabulary:
.. code-block::
>>> from pprint import pprint
>>> pprint(nlp(f"HuggingFace is creating a {nlp.tokenizer.mask_token} that the community uses to solve NLP tasks."))
[{'score': 0.1792745739221573,
'sequence': '<s>HuggingFace is creating a tool that the community uses to '
'solve NLP tasks.</s>',
'token': 3944,
'token_str': 'Ġtool'},
{'score': 0.11349421739578247,
'sequence': '<s>HuggingFace is creating a framework that the community uses '
'to solve NLP tasks.</s>',
'token': 7208,
'token_str': 'Ġframework'},
{'score': 0.05243554711341858,
'sequence': '<s>HuggingFace is creating a library that the community uses to '
'solve NLP tasks.</s>',
'token': 5560,
'token_str': 'Ġlibrary'},
{'score': 0.03493533283472061,
'sequence': '<s>HuggingFace is creating a database that the community uses '
'to solve NLP tasks.</s>',
'token': 8503,
'token_str': 'Ġdatabase'},
{'score': 0.02860250137746334,
'sequence': '<s>HuggingFace is creating a prototype that the community uses '
'to solve NLP tasks.</s>',
'token': 17715,
'token_str': 'Ġprototype'}]
Here is an example doing masked language modeling using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a DistilBERT model and
loads it with the weights stored in the checkpoint.
- Define a sequence with a masked token, placing the :obj:`tokenizer.mask_token` instead of a word.
- Encode that sequence into IDs and find the position of the masked token in that list of IDs.
- Retrieve the predictions at the index of the mask token: this tensor has the same size as the vocabulary, and the
values are the scores attributed to each token. The model gives higher score to tokens he deems probable in that
context.
- Retrieve the top 5 tokens using the PyTorch :obj:`topk` or TensorFlow :obj:`top_k` methods.
- Replace the mask token by the tokens and print the results
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = AutoModelWithLMHead.from_pretrained("distilbert-base-cased")
>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
>>> input = tokenizer.encode(sequence, return_tensors="pt")
>>> mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]
>>> token_logits = model(input)[0]
>>> mask_token_logits = token_logits[0, mask_token_index, :]
>>> top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
>>> model = TFAutoModelWithLMHead.from_pretrained("distilbert-base-cased")
>>> sequence = f"Distilled models are smaller than the models they mimic. Using them instead of the large versions would help {tokenizer.mask_token} our carbon footprint."
>>> input = tokenizer.encode(sequence, return_tensors="tf")
>>> mask_token_index = tf.where(input == tokenizer.mask_token_id)[0, 1]
>>> token_logits = model(input)[0]
>>> mask_token_logits = token_logits[0, mask_token_index, :]
>>> top_5_tokens = tf.math.top_k(mask_token_logits, 5).indices.numpy()
This prints five sequences, with the top 5 tokens predicted by the model:
.. code-block::
>>> for token in top_5_tokens:
... print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.
Causal Language Modeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Causal language modeling is the task of predicting the token following a sequence of tokens. In this situation, the
model only attends to the left context (tokens on the left of the mask). Such a training is particularly interesting
for generation tasks.
Usually, the next token is predicted by sampling from the logits of the last hidden state the model produces from the input sequence.
Here is an example using the tokenizer and model and leveraging the :func:`~transformers.PreTrainedModel.top_k_top_p_filtering` method to sample the next token following an input sequence of tokens.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer, top_k_top_p_filtering
>>> import torch
>>> from torch.nn import functional as F
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = AutoModelWithLMHead.from_pretrained("gpt2")
>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "
>>> input_ids = tokenizer.encode(sequence, return_tensors="pt")
>>> # get logits of last hidden state
>>> next_token_logits = model(input_ids)[0][:, -1, :]
>>> # filter
>>> filtered_next_token_logits = top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
>>> # sample
>>> probs = F.softmax(filtered_next_token_logits, dim=-1)
>>> next_token = torch.multinomial(probs, num_samples=1)
>>> generated = torch.cat([input_ids, next_token], dim=-1)
>>> resulting_string = tokenizer.decode(generated.tolist()[0])
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer, tf_top_k_top_p_filtering
>>> import tensorflow as tf
>>> tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> model = TFAutoModelWithLMHead.from_pretrained("gpt2")
>>> sequence = f"Hugging Face is based in DUMBO, New York City, and "
>>> input_ids = tokenizer.encode(sequence, return_tensors="tf")
>>> # get logits of last hidden state
>>> next_token_logits = model(input_ids)[0][:, -1, :]
>>> # filter
>>> filtered_next_token_logits = tf_top_k_top_p_filtering(next_token_logits, top_k=50, top_p=1.0)
>>> # sample
>>> next_token = tf.random.categorical(filtered_next_token_logits, dtype=tf.int32, num_samples=1)
>>> generated = tf.concat([input_ids, next_token], axis=1)
>>> resulting_string = tokenizer.decode(generated.numpy().tolist()[0])
This outputs a (hopefully) coherent next token following the original sequence, which is in our case is the word *has*:
.. code-block::
print(resulting_string)
Hugging Face is based in DUMBO, New York City, and has
In the next section, we show how this functionality is leveraged in :func:`~transformers.PreTrainedModel.generate` to generate multiple tokens up to a user-defined length.
Text Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In text generation (*a.k.a* *open-ended text generation*) the goal is to create a coherent portion of text that is a continuation from the given context. As an example, is it shown how *GPT-2* can be used in pipelines to generate text. As a default all models apply *Top-K* sampling when used in pipelines as configured in their respective configurations (see `gpt-2 config <https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-config.json>`_ for example).
.. code-block::
>>> from transformers import pipeline
>>> text_generator = pipeline("text-generation")
>>> print(text_generator("As far as I am concerned, I will", max_length=50, do_sample=False))
[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]
Here the model generates a random text with a total maximal length of *50* tokens from context *"As far as I am concerned, I will"*.
The default arguments of ``PreTrainedModel.generate()`` can directly be overriden in the pipeline as is shown above for the argument ``max_length``.
Here is an example for text generation using XLNet and its tokenzier.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
>>> PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
... (except for Alexei and Maria) are discovered.
... The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
... remainder of the story. 1883 Western Siberia,
... a young Grigori Rasputin is asked by his father and a group of men to perform magic.
... Rasputin has a vision and denounces one of the men as a horse thief. Although his
... father initially slaps him for making such an accusation, Rasputin watches as the
... man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
... the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
... with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
>>> prompt = "Today the weather is really nice and I am planning on "
>>> inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="pt")
>>> prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
>>> outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
>>> generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
>>> tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")
>>> # Padding text helps XLNet with short prompts - proposed by Aman Rusia in https://github.com/rusiaaman/XLNet-gen#methodology
>>> PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
... (except for Alexei and Maria) are discovered.
... The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
... remainder of the story. 1883 Western Siberia,
... a young Grigori Rasputin is asked by his father and a group of men to perform magic.
... Rasputin has a vision and denounces one of the men as a horse thief. Although his
... father initially slaps him for making such an accusation, Rasputin watches as the
... man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
... the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
... with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""
>>> prompt = "Today the weather is really nice and I am planning on "
>>> inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False, return_tensors="tf")
>>> prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))
>>> outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95, top_k=60)
>>> generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]
.. code-block::
print(generated)
Text generation is currently possible with *GPT-2*, *OpenAi-GPT*, *CTRL*, *XLNet*, *Transfo-XL* and *Reformer* in PyTorch and for most models in Tensorflow as well. As can be seen in the example above *XLNet* and *Transfo-xl* often need to be padded to work well.
GPT-2 is usually a good choice for *open-ended text generation* because it was trained on millions on webpages with a causal language modeling objective.
For more information on how to apply different decoding strategies for text generation, please also refer to our generation blog post `here <https://huggingface.co/blog/how-to-generate>`_.
Named Entity Recognition
----------------------------------------------------
Named Entity Recognition (NER) is the task of classifying tokens according to a class, for example identifying a
token as a person, an organisation or a location.
An example of a named entity recognition dataset is the CoNLL-2003 dataset, which is entirely based on that task.
If you would like to fine-tune a model on an NER task, you may leverage the `ner/run_ner.py` (PyTorch),
`ner/run_pl_ner.py` (leveraging pytorch-lightning) or the `ner/run_tf_ner.py` (TensorFlow) scripts.
Here is an example using the pipelines do to named entity recognition, trying to identify tokens as belonging to one
of 9 classes:
- O, Outside of a named entity
- B-MIS, Beginning of a miscellaneous entity right after another miscellaneous entity
- I-MIS, Miscellaneous entity
- B-PER, Beginning of a person's name right after another person's name
- I-PER, Person's name
- B-ORG, Beginning of an organisation right after another organisation
- I-ORG, Organisation
- B-LOC, Beginning of a location right after another location
- I-LOC, Location
It leverages a fine-tuned model on CoNLL-2003, fine-tuned by `@stefan-it <https://github.com/stefan-it>`__ from
`dbmdz <https://github.com/dbmdz>`__.
.. code-block::
>>> from transformers import pipeline
>>> nlp = pipeline("ner")
>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very"
... "close to the Manhattan Bridge which is visible from the window."
This outputs a list of all words that have been identified as an entity from the 9 classes defined above. Here is the
expected results:
.. code-block::
print(nlp(sequence))
[
{'word': 'Hu', 'score': 0.9995632767677307, 'entity': 'I-ORG'},
{'word': '##gging', 'score': 0.9915938973426819, 'entity': 'I-ORG'},
{'word': 'Face', 'score': 0.9982671737670898, 'entity': 'I-ORG'},
{'word': 'Inc', 'score': 0.9994403719902039, 'entity': 'I-ORG'},
{'word': 'New', 'score': 0.9994346499443054, 'entity': 'I-LOC'},
{'word': 'York', 'score': 0.9993270635604858, 'entity': 'I-LOC'},
{'word': 'City', 'score': 0.9993864893913269, 'entity': 'I-LOC'},
{'word': 'D', 'score': 0.9825621843338013, 'entity': 'I-LOC'},
{'word': '##UM', 'score': 0.936983048915863, 'entity': 'I-LOC'},
{'word': '##BO', 'score': 0.8987102508544922, 'entity': 'I-LOC'},
{'word': 'Manhattan', 'score': 0.9758241176605225, 'entity': 'I-LOC'},
{'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
]
Note how the words "Hugging Face" have been identified as an organisation, and "New York City", "DUMBO" and
"Manhattan Bridge" have been identified as locations.
Here is an example doing named entity recognition using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and
loads it with the weights stored in the checkpoint.
- Define the label list with which the model was trained on.
- Define a sequence with known entities, such as "Hugging Face" as an organisation and "New York City" as a location.
- Split words into tokens so that they can be mapped to the predictions. We use a small hack by firstly completely
encoding and decoding the sequence, so that we're left with a string that contains the special tokens.
- Encode that sequence into IDs (special tokens are added automatically).
- Retrieve the predictions by passing the input to the model and getting the first output. This results in a
distribution over the 9 possible classes for each token. We take the argmax to retrieve the most likely class
for each token.
- Zip together each token with its prediction and print it.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelForTokenClassification, AutoTokenizer
>>> import torch
>>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
>>> label_list = [
... "O", # Outside of a named entity
... "B-MISC", # Beginning of a miscellaneous entity right after another miscellaneous entity
... "I-MISC", # Miscellaneous entity
... "B-PER", # Beginning of a person's name right after another person's name
... "I-PER", # Person's name
... "B-ORG", # Beginning of an organisation right after another organisation
... "I-ORG", # Organisation
... "B-LOC", # Beginning of a location right after another location
... "I-LOC" # Location
... ]
>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
... "close to the Manhattan Bridge."
>>> # Bit of a hack to get the tokens with the special tokens
>>> tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
>>> inputs = tokenizer.encode(sequence, return_tensors="pt")
>>> outputs = model(inputs)[0]
>>> predictions = torch.argmax(outputs, dim=2)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelForTokenClassification, AutoTokenizer
>>> import tensorflow as tf
>>> model = TFAutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
>>> label_list = [
... "O", # Outside of a named entity
... "B-MISC", # Beginning of a miscellaneous entity right after another miscellaneous entity
... "I-MISC", # Miscellaneous entity
... "B-PER", # Beginning of a person's name right after another person's name
... "I-PER", # Person's name
... "B-ORG", # Beginning of an organisation right after another organisation
... "I-ORG", # Organisation
... "B-LOC", # Beginning of a location right after another location
... "I-LOC" # Location
... ]
>>> sequence = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very" \
... "close to the Manhattan Bridge."
>>> # Bit of a hack to get the tokens with the special tokens
>>> tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
>>> inputs = tokenizer.encode(sequence, return_tensors="tf")
>>> outputs = model(inputs)[0]
>>> predictions = tf.argmax(outputs, axis=2)
This outputs a list of each token mapped to their prediction. Differently from the pipeline, here every token has
a prediction as we didn't remove the "0" class which means that no particular entity was found on that token. The
following array should be the output:
.. code-block::
>>> print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].numpy())])
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
Summarization
----------------------------------------------------
Summarization is the task of summarizing a text / an article into a shorter text.
An example of a summarization dataset is the CNN / Daily Mail dataset, which consists of long news articles and was created for the task of summarization.
If you would like to fine-tune a model on a summarization task, you may leverage the ``examples/summarization/bart/run_train.sh`` (leveraging pytorch-lightning) script.
Here is an example using the pipelines do to summarization.
It leverages a Bart model that was fine-tuned on the CNN / Daily Mail data set.
.. code-block::
>>> from transformers import pipeline
>>> summarizer = pipeline("summarization")
>>> ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
... A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
... Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
... In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
... Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
... 2010 marriage license application, according to court documents.
... Prosecutors said the marriages were part of an immigration scam.
... On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
... After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
... Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
... All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
... Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
... Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
... The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
... Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
... Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
... If convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.
... """
Because the summarization pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
of ``PretrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` and ``min_length`` above.
This outputs the following summary:
.. code-block::
>>> print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))
[{'summary_text': 'Liana Barrientos, 39, is charged with two counts of "offering a false instrument for filing in the first degree" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]
Here is an example doing summarization using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
- Define the article that should be summarizaed.
- Leverage the ``PretrainedModel.generate()`` method.
- Add the T5 specific prefix "summarize: ".
Here Google`s T5 model is used that was only pre-trained on a multi-task mixed data set (including CNN / Daily Mail), but nevertheless yields very good results.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
>>> inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512)
>>> outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> # T5 uses a max_length of 512 so we cut the article to 512 tokens.
>>> inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512)
>>> outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
Translation
----------------------------------------------------
Translation is the task of translating a text from one language to another.
An example of a translation dataset is the WMT English to German dataset, which has English sentences as the input data
and German sentences as the target data.
Here is an example using the pipelines do to translation.
It leverages a T5 model that was only pre-trained on a multi-task mixture dataset (including WMT), but yields impressive
translation results nevertheless.
.. code-block::
>>> from transformers import pipeline
>>> translator = pipeline("translation_en_to_de")
>>> print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))
[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]
Because the translation pipeline depends on the ``PretrainedModel.generate()`` method, we can override the default arguments
of ``PretrainedModel.generate()`` directly in the pipeline as is shown for ``max_length`` above.
This outputs the following translation into German:
::
Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.
Here is an example doing translation using a model and a tokenizer. The process is the following:
- Instantiate a tokenizer and a model from the checkpoint name. Summarization is usually done using an encoder-decoder model, such as ``Bart`` or ``T5``.
- Define the article that should be summarizaed.
- Leverage the ``PretrainedModel.generate()`` method.
- Add the T5 specific prefix "translate English to German: "
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import AutoModelWithLMHead, AutoTokenizer
>>> model = AutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> print(outputs)
tensor([[ 0, 11560, 3896, 8881, 229, 236, 3, 14366, 15377, 181,
11216, 16, 368, 1060, 64, 1919, 5]])
>>> ## TENSORFLOW CODE
>>> from transformers import TFAutoModelWithLMHead, AutoTokenizer
>>> model = TFAutoModelWithLMHead.from_pretrained("t5-base")
>>> tokenizer = AutoTokenizer.from_pretrained("t5-base")
>>> inputs = tokenizer.encode("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="tf")
>>> outputs = model.generate(inputs, max_length=40, num_beams=4, early_stopping=True)
>>> print(outputs)
tf.Tensor(
[[ 0 11560 3896 8881 229 236 3 14366 15377 181 11216 16
368 1060 64 1919 5]], shape=(1, 17), dtype=int32)

135
docs/source/torchscript.rst Normal file
View File

@ -0,0 +1,135 @@
TorchScript
================================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
with variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming
releases, with more code examples, a more flexible implementation, and benchmarks comparing python-based codes
with compiled TorchScript.
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch code".
Pytorch's two modules `JIT and TRACE <https://pytorch.org/docs/stable/jit.html>`_ allow the developer to export
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can
be reused in a different environment than a Pytorch-based python program. Here we explain how to use our models so that
they can be exported, and what to be mindful of when using these models with TorchScript.
Exporting a model needs two things:
* dummy inputs to execute a model forward pass.
* the model needs to be instantiated with the ``torchscript`` flag.
These necessities imply several things developers should be careful about. These are detailed below.
Implications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TorchScript flag and tied weights
------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights,
it is therefore necessary to untie the weights beforehand.
This implies that models instantiated with the ``torchscript`` flag have their ``Embedding`` layer and ``Decoding`` layer
separate, which means that they should not be trained down the line. Training would de-synchronize the two layers,
leading to unexpected results.
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
can be safely exported without the ``torchscript`` flag.
Dummy inputs and standard lengths
------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
to create the "trace" of the model.
The trace is created relatively to the inputs' dimensions. It is therefore constrained by the dimensions of the dummy
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
as:
``The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2``
will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest
input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model
will have been traced with a large input size however, the dimensions of the different matrix will be large as well,
resulting in more calculations.
It is recommended to be careful of the total number of operations done on each input and to follow performance closely
when exporting varying sequence-length models.
Using TorchScript in Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Below are examples of using the Python to save, load models as well as how to use the trace for inference.
Saving a model
------------------------------------------------
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
.. code-block:: python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True)
# Instantiating the model
model = BertModel(config)
# The model needs to be in evaluation mode
model.eval()
# If you are instantiating the model with `from_pretrained` you can also easily set the TorchScript flag
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
------------------------------------------------
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``.
.. code-block:: python
loaded_model = torch.jit.load("traced_model.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(dummy_input)
Using a traced model for inference
------------------------------------------------
Using the traced model for inference is as simple as using its ``__call__`` dunder method:
.. code-block:: python
traced_model(tokens_tensor, segments_tensors)

323
docs/source/training.rst Normal file
View File

@ -0,0 +1,323 @@
Training and fine-tuning
========================
Model classes in 🤗 Transformers are designed to be compatible with native
PyTorch and TensorFlow 2 and can be used seemlessly with either. In this
quickstart, we will show how to fine-tune (or train from scratch) a model
using the standard training tools available in either framework. We will also
show how to use our included :func:`~transformers.Trainer` class which
handles much of the complexity of training for you.
This guide assume that you are already familiar with loading and use our
models for inference; otherwise, see the :doc:`task summary <task_summary>`. We also assume
that you are familiar with training deep neural networks in either PyTorch or
TF2, and focus specifically on the nuances and tools for training models in
🤗 Transformers.
Sections:
* :ref:`pytorch`
* :ref:`tensorflow`
* :ref:`trainer`
* :ref:`additional-resources`
.. _pytorch:
Fine-tuning in native PyTorch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Model classes in 🤗 Transformers that don't begin with ``TF`` are
`PyTorch Modules <https://pytorch.org/docs/master/generated/torch.nn.Module.html>`_,
meaning that you can use them just as you would any model in PyTorch for
both inference and optimization.
Let's consider the common task of fine-tuning a masked language model like
BERT on a sequence classification dataset. When we instantiate a model with
:func:`~transformers.PreTrainedModel.from_pretrained`, the model
configuration and pre-trained weights
of the specified model are used to initialize the model. The
library also includes a number of task-specific final layers or 'heads' whose
weights are instantiated randomly when not present in the specified
pre-trained model. For example, instantiating a model with
``BertForSequenceClassification.from_pretrained('bert-base-uncased', num_classes=2)``
will create a BERT model instance with encoder weights copied from the
``bert-base-uncased`` model and a randomly initialized sequence
classification head on top of the encoder with an output size of 2. Models
are initialized in ``eval`` mode by default. We can call ``model.train()`` to
put it in train mode.
.. code-block:: python
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
model.train()
This is useful because it allows us to make use of the pre-trained BERT
encoder and easily train it on whatever sequence classification dataset we
choose. We can use any PyTorch optimizer, but our library also provides the
:func:`~transformers.AdamW` optimizer which implements gradient bias
correction as well as weight decay.
.. code-block:: python
from transformers import AdamW
optimizer = AdamW(model.parameters(), lr=1e-5)
The optimizer allows us to apply different hyperpameters for specific
parameter groups. For example, we can apply weight decay to all parameters
other than bias and layer normalization terms:
.. code-block:: python
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
{'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
{'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
]
optimizer = AdamW(optimizer_grouped_parameters, lr=1e-5)
Now we can set up a simple dummy training batch using
:func:`~transformers.PreTrainedTokenizer.__call__`. This returns a
:func:`~transformers.BatchEncoding` instance which
prepares everything we might need to pass to the model.
.. code-block:: python
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
text_batch = ["I love Pixar.", "I don't care for Pixar."]
encoding = tokenizer(text_batch, return_tensors='pt', padding=True, truncation=True)
input_ids = encoding['input_ids']
attention_mask = encoding['attention_mask']
When we call a classification model with the ``labels`` argument, the first
returned element is the Cross Entropy loss between the predictions and the
passed labels. Having already set up our optimizer, we can then do a
backwards pass and update the weights:
.. code-block:: python
labels = torch.tensor([1,0]).unsqueeze(0)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optimizer.step()
Alternatively, you can just get the logits and calculate the loss yourself.
The following is equivalent to the previous example:
.. code-block:: python
from torch.nn import functional as F
labels = torch.tensor([1,0]).unsqueeze(0)
outputs = model(input_ids, attention_mask=attention_mask)
loss = F.cross_entropy(labels, outputs[0])
loss.backward()
optimizer.step()
Of course, you can train on GPU by calling ``to('cuda')`` on the model and
inputs as usual.
We also provide a few learning rate scheduling tools. With the following, we
can set up a scheduler which warms up for ``num_warmup_steps`` and then
linearly decays to 0 by the end of training.
.. code-block:: python
from transformers import get_linear_schedule_with_warmup
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_train_steps)
Then all we have to do is call ``scheduler.step()`` after ``optimizer.step()``.
.. code-block:: python
...
loss.backward()
optimizer.step()
scheduler.step()
We highly recommend using :func:`~transformers.Trainer`, discussed below,
which conveniently handles the moving parts of training 🤗 Transformers models
with features like mixed precision and easy tensorboard logging.
Freezing the encoder
--------------------
In some cases, you might be interested in keeping the weights of the
pre-trained encoder frozen and optimizing only the weights of the head
layers. To do so, simply set the ``requires_grad`` attribute to ``False`` on
the encoder parameters, which can be accessed with the ``base_model``
submodule on any task-specific model in the library:
.. code-block:: python
for param in model.base_model.parameters():
param.requires_grad = False
.. _tensorflow:
Fine-tuning in native TensorFlow 2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Models can also be trained natively in TensorFlow 2. Just as with PyTorch,
TensorFlow models can be instantiated with
:func:`~transformers.PreTrainedModel.from_pretrained` to load the weights of
the encoder from a pretrained model.
.. code-block:: python
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')
Let's use ``tensorflow_datasets`` to load in the `MRPC dataset
<https://www.tensorflow.org/datasets/catalog/glue#gluemrpc>`_ from GLUE. We
can then use our built-in
:func:`~transformers.data.processors.glue.glue_convert_examples_to_features`
to tokenize MRPC and convert it to a TensorFlow ``Dataset`` object. Note that
tokenizers are framework-agnostic, so there is no need to prepend ``TF`` to
the pretrained tokenizer name.
.. code-block:: python
from transformers import BertTokenizer, glue_convert_examples_to_features
import tensorflow_datasets as tfds
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
data = tfds.load('glue/mrpc')
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, max_length=128, task='mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
The model can then be compiled and trained as any Keras model:
.. code-block:: python
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer=optimizer, loss=loss)
model.fit(train_dataset, epochs=2, steps_per_epoch=115)
With the tight interoperability between TensorFlow and PyTorch models, you
can even save the model and then reload it as a PyTorch model (or vice-versa):
.. code-block:: python
from transformers import BertForSequenceClassification
model.save_pretrained('./my_mrpc_model/')
pytorch_model = BertForSequenceClassification.from_pretrained('./my_mrpc_model/', from_tf=True)
.. _trainer:
Trainer
^^^^^^^
We also provide a simple but feature-complete training and evaluation
interface through :func:`~transformers.Trainer` and
:func:`~transformers.TFTrainer`. You can train, fine-tune,
and evaluate any 🤗 Transformers model with a wide range of training options and
with built-in features like logging, gradient accumulation, and mixed
precision.
.. code-block:: python
## PYTORCH CODE
from transformers import BertForSequenceClassification, Trainer, TrainingArguments
model = BertForSequenceClassification.from_pretrained("bert-large-uncased")
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=test_dataset # evaluation dataset
)
## TENSORFLOW CODE
from transformers import TFBertForSequenceClassification, TFTrainer, TFTrainingArguments
model = TFBertForSequenceClassification.from_pretrained("bert-large-uncased")
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
trainer = TFTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=tfds_train_dataset, # tensorflow_datasets training dataset
eval_dataset=tfds_test_dataset # tensorflow_datasets evaluation dataset
)
Now simply call ``trainer.train()`` to train and ``trainer.evaluate()`` to
evaluate. You can use your own module as well, but the first
argument returned from ``forward`` must be the loss which you wish to
optimize.
:func:`~transformers.Trainer` uses a built-in default function to collate
batches and prepare them to be fed into the model. If needed, you can also
use the ``data_collator`` argument to pass your own collator function which
takes in the data in the format provides by your dataset and returns a
batch ready to be fed into the model. Note that
:func:`~transformers.TFTrainer` expects the passed datasets to be dataset
objects from ``tensorflow_datasets``.
To calculate additional metrics in addition to the loss, you can also define
your own ``compute_metrics`` function and pass it to the trainer.
.. code-block:: python
from sklearn.metrics import precision_recall_fscore_support
def compute_metrics(pred):
labels = pred.label_ids
preds = pred.predictions.argmax(-1)
precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
acc = accuracy_score(labels, preds)
return {
'accuracy': acc,
'f1': f1,
'precision': precision,
'recall': recall
}
Finally, you can view the results, including any calculated metrics, by
launching tensorboard in your specified ``logging_dir`` directory.
.. _additional-resources:
Additional resources
^^^^^^^^^^^^^^^^^^^^
* `A lightweight colab demo
<https://colab.research.google.com/drive/1-JIJlao4dI-Ilww_NnTc0rxtp-ymgDgM?usp=sharing>`_
which uses ``Trainer`` for IMDb sentiment classification.
* `🤗 Transformers Examples <https://github.com/huggingface/transformers/tree/master/examples>`_
including scripts for training and fine-tuning on GLUE, SQuAD, and
several other tasks.
* `How to train a language model
<https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb>`_,
a detailed colab notebook which uses ``Trainer`` to train a masked
language model from scratch on Esperanto.
* `🤗 Transformers Notebooks <./notebooks.html>`_ which contain dozens
of example notebooks from the community for training and using
🤗 Transformers on a variety of tasks.

80
examples/README.md Normal file
View File

@ -0,0 +1,80 @@
## Examples
Version 2.9 of 🤗 Transformers introduces a new [`Trainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) class for PyTorch, and its equivalent [`TFTrainer`](https://github.com/huggingface/transformers/blob/master/src/transformers/trainer_tf.py) for TF 2.
Running the examples requires PyTorch 1.3.1+ or TensorFlow 2.1+.
Here is the list of all our examples:
- **grouped by task** (all official examples work for multiple models)
- with information on whether they are **built on top of `Trainer`/`TFTrainer`** (if not, they still work, they might just lack some features),
- whether they also include examples for **`pytorch-lightning`**, which is a great fully-featured, general-purpose training library for PyTorch,
- links to **Colab notebooks** to walk through the scripts and run them easily,
- links to **Cloud deployments** to be able to deploy large-scale trainings in the Cloud with little to no setup.
This is still a work-in-progress in particular documentation is still sparse so please **contribute improvements/pull requests.**
# The Big Table of Tasks
| Task | Example datasets | Trainer support | TFTrainer support | pytorch-lightning | Colab
|---|---|:---:|:---:|:---:|:---:|
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/language-modeling) | Raw text | ✅ | - | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/text-classification) | GLUE, XNLI | ✅ | ✅ | ✅ | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/trainer/01_text_classification.ipynb)
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/token-classification) | CoNLL NER | ✅ | ✅ | ✅ | -
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/multiple-choice) | SWAG, RACE, ARC | ✅ | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ViktorAlm/notebooks/blob/master/MPC_GPU_Demo_for_TF_and_PT.ipynb)
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/question-answering) | SQuAD | - | ✅ | - | -
| [**`text-generation`**](https://github.com/huggingface/transformers/tree/master/examples/text-generation) | - | n/a | n/a | n/a | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb)
| [**`distillation`**](https://github.com/huggingface/transformers/tree/master/examples/distillation) | All | - | - | - | -
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | CNN/Daily Mail | - | - | ✅ | -
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/seq2seq) | WMT | - | - | ✅ | -
| [**`bertology`**](https://github.com/huggingface/transformers/tree/master/examples/bertology) | - | - | - | - | -
| [**`adversarial`**](https://github.com/huggingface/transformers/tree/master/examples/adversarial) | HANS | ✅ | - | - | -
<br>
## Important note
**Important**
To make sure you can successfully run the latest versions of the example scripts, you have to install the library from source and install some example-specific requirements.
Execute the following steps in a new virtual environment:
```bash
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
pip install -r ./examples/requirements.txt
```
## One-click Deploy to Cloud (wip)
#### Azure
[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Fazure-quickstart-templates%2Fmaster%2F101-storage-account-create%2Fazuredeploy.json)
## Running on TPUs
When using Tensorflow, TPUs are supported out of the box as a `tf.distribute.Strategy`.
When using PyTorch, we support TPUs thanks to `pytorch/xla`. For more context and information on how to setup your TPU environment refer to Google's documentation and to the
very detailed [pytorch/xla README](https://github.com/pytorch/xla/blob/master/README.md).
In this repo, we provide a very simple launcher script named [xla_spawn.py](https://github.com/huggingface/transformers/tree/master/examples/xla_spawn.py) that lets you run our example scripts on multiple TPU cores without any boilerplate.
Just pass a `--num_cores` flag to this script, then your regular training script with its arguments (this is similar to the `torch.distributed.launch` helper for torch.distributed).
For example for `run_glue`:
```bash
python examples/xla_spawn.py --num_cores 8 \
examples/text-classification/run_glue.py
--model_name_or_path bert-base-cased \
--task_name mnli \
--data_dir ./data/glue_data/MNLI \
--output_dir ./models/tpu \
--overwrite_output_dir \
--do_train \
--do_eval \
--num_train_epochs 1 \
--save_steps 20000
```
Feedback and more use cases and benchmarks involving TPUs are welcome, please share with the community.

View File

@ -0,0 +1,38 @@
## Adversarial evaluation of model performances
Here is an example on evaluating a model using adversarial evaluation of natural language inference with the Heuristic Analysis for NLI Systems (HANS) dataset [McCoy et al., 2019](https://arxiv.org/abs/1902.01007). The example was gracefully provided by [Nafise Sadat Moosavi](https://github.com/ns-moosavi).
The HANS dataset can be downloaded from [this location](https://github.com/tommccoy1/hans).
This is an example of using test_hans.py:
```bash
export HANS_DIR=path-to-hans
export MODEL_TYPE=type-of-the-model-e.g.-bert-roberta-xlnet-etc
export MODEL_PATH=path-to-the-model-directory-that-is-trained-on-NLI-e.g.-by-using-run_glue.py
python run_hans.py \
--task_name hans \
--model_type $MODEL_TYPE \
--do_eval \
--data_dir $HANS_DIR \
--model_name_or_path $MODEL_PATH \
--max_seq_length 128 \
--output_dir $MODEL_PATH \
```
This will create the hans_predictions.txt file in MODEL_PATH, which can then be evaluated using hans/evaluate_heur_output.py from the HANS dataset.
The results of the BERT-base model that is trained on MNLI using batch size 8 and the random seed 42 on the HANS dataset is as follows:
```bash
Heuristic entailed results:
lexical_overlap: 0.9702
subsequence: 0.9942
constituent: 0.9962
Heuristic non-entailed results:
lexical_overlap: 0.199
subsequence: 0.0396
constituent: 0.118
```

View File

@ -0,0 +1,231 @@
# coding=utf-8
# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
""" Finetuning the library models for sequence classification on HANS."""
import logging
import os
from dataclasses import dataclass, field
from typing import Dict, List, Optional
import numpy as np
import torch
from transformers import (
AutoConfig,
AutoModelForSequenceClassification,
AutoTokenizer,
HfArgumentParser,
Trainer,
TrainingArguments,
default_data_collator,
set_seed,
)
from utils_hans import HansDataset, InputFeatures, hans_processors, hans_tasks_num_labels
logger = logging.getLogger(__name__)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
"""
model_name_or_path: str = field(
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
tokenizer_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None, metadata={"help": "Where do you want to store the pretrained models downloaded from s3"}
)
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
task_name: str = field(
metadata={"help": "The name of the task to train selected in the list: " + ", ".join(hans_processors.keys())}
)
data_dir: str = field(
metadata={"help": "The input data dir. Should contain the .tsv files (or other data files) for the task."}
)
max_seq_length: int = field(
default=128,
metadata={
"help": "The maximum total input sequence length after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded."
},
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
)
def hans_data_collator(features: List[InputFeatures]) -> Dict[str, torch.Tensor]:
"""
Data collator that removes the "pairID" key if present.
"""
batch = default_data_collator(features)
_ = batch.pop("pairID", None)
return batch
def main():
# See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns.
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
if (
os.path.exists(training_args.output_dir)
and os.listdir(training_args.output_dir)
and training_args.do_train
and not training_args.overwrite_output_dir
):
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. Use --overwrite_output_dir to overcome."
)
# Setup logging
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO if training_args.local_rank in [-1, 0] else logging.WARN,
)
logger.warning(
"Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
training_args.local_rank,
training_args.device,
training_args.n_gpu,
bool(training_args.local_rank != -1),
training_args.fp16,
)
logger.info("Training/evaluation parameters %s", training_args)
# Set seed
set_seed(training_args.seed)
try:
num_labels = hans_tasks_num_labels[data_args.task_name]
except KeyError:
raise ValueError("Task not found: %s" % (data_args.task_name))
# Load pretrained model and tokenizer
#
# Distributed training:
# The .from_pretrained methods guarantee that only one local process can concurrently
# download model & vocab.
config = AutoConfig.from_pretrained(
model_args.config_name if model_args.config_name else model_args.model_name_or_path,
num_labels=num_labels,
finetuning_task=data_args.task_name,
cache_dir=model_args.cache_dir,
)
tokenizer = AutoTokenizer.from_pretrained(
model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
cache_dir=model_args.cache_dir,
)
model = AutoModelForSequenceClassification.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
)
# Get datasets
train_dataset = (
HansDataset(
data_dir=data_args.data_dir,
tokenizer=tokenizer,
task=data_args.task_name,
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
)
if training_args.do_train
else None
)
eval_dataset = (
HansDataset(
data_dir=data_args.data_dir,
tokenizer=tokenizer,
task=data_args.task_name,
max_seq_length=data_args.max_seq_length,
overwrite_cache=data_args.overwrite_cache,
evaluate=True,
)
if training_args.do_eval
else None
)
# Initialize our Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=hans_data_collator,
)
# Training
if training_args.do_train:
trainer.train(
model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None
)
trainer.save_model()
# For convenience, we also re-save the tokenizer to the same directory,
# so that you can share your model easily on huggingface.co/models =)
if trainer.is_world_master():
tokenizer.save_pretrained(training_args.output_dir)
# Evaluation
if training_args.do_eval:
logger.info("*** Evaluate ***")
output = trainer.predict(eval_dataset)
preds = output.predictions
preds = np.argmax(preds, axis=1)
pair_ids = [ex.pairID for ex in eval_dataset]
output_eval_file = os.path.join(training_args.output_dir, "hans_predictions.txt")
label_list = eval_dataset.get_labels()
if trainer.is_world_master():
with open(output_eval_file, "w") as writer:
writer.write("pairID,gold_label\n")
for pid, pred in zip(pair_ids, preds):
writer.write("ex" + str(pid) + "," + label_list[int(pred)] + "\n")
trainer._log(output.metrics)
def _mp_fn(index):
# For xla_spawn (TPUs)
main()
if __name__ == "__main__":
main()

Some files were not shown because too many files have changed in this diff Show More