Compare commits

...

4576 Commits

Author SHA1 Message Date
98af96a156 Release: v4.2.2 2021-01-21 09:06:41 +01:00
0ed8bccc9c Fix GPT conversion script (#9676) 2021-01-21 09:00:32 +01:00
c5f6719040 Fix imports in conversion scripts (#9674) 2021-01-21 09:00:26 +01:00
21d45958af [TF Led] Fix wrong decoder attention mask behavior (#9601)
* fix tf led

* remove loop file
2021-01-21 08:57:35 +01:00
4d4d2ce135 Remove branch overload in conda yml 2021-01-14 14:27:36 +01:00
1528b1007f Upload v4.2.1 to anaconda 2021-01-14 14:24:32 +01:00
236cc365af Release: v4.2.1 2021-01-14 14:17:56 +01:00
5b05321b56 BatchEncoding.to with device with tests (#9584) 2021-01-14 14:08:40 +01:00
412d878c5e Compliancy with tf-nightly (#9570)
* Compliancy with tf-nightly

* Add more version + restore min version check
2021-01-14 14:08:30 +01:00
59fbd64b1c Fix Trainer with a parallel model (#9578)
* Fix Trainer with a parallel model

* More clean up
2021-01-14 14:08:19 +01:00
7d9a9d0c72 Release: v4.2.0 2021-01-13 16:01:51 +01:00
c949516695 Fix slow tests v4.2.0 (#9561)
* Fix conversational pipeline test

* LayoutLM

* ProphetNet

* BART

* Blenderbot & small

* Marian

* mBART

* Pegasus

* Tapas tokenizer

* BERT2BERT test

* Style

* Example requirements

* TF BERT2BERT test
2021-01-13 09:55:48 -05:00
04dc65e5c6 Fix data parallelism in Trainer (#9566)
* Fix data parallelism in Trainer

* Update src/transformers/training_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-13 09:54:41 -05:00
b2dfcc567b use correct deps for torchhub (#9552) 2021-01-13 08:02:53 -05:00
eabad8fd9c Update run_glue for do_predict with local test data (#9442) (#9486)
* Update run_glue for do_predict with local test data (#9442)

* Update run_glue (#9442): fix comments ('files' to 'a file')

* Update run_glue (#9442): reflect the code review

* Update run_glue (#9442): auto format

* Update run_glue (#9442): reflect the code review
2021-01-13 07:48:35 -05:00
0c9f01a8e5 Speed up TopKLogitsWarper and TopPLogitsWarper (pytorch) (#9557)
* make TopKLogitsWarper faster

* make TopPLogitsWarper faster
2021-01-13 07:47:47 -05:00
27d0e01d75 Fix classification script: enable dynamic padding with truncation (#9554)
Co-authored-by: Pavel Tarashkevich <Pavel.Tarashkievich@orange.com>
2021-01-13 07:46:48 -05:00
245cdb469d Fix barthez tokenizer (#9562) 2021-01-13 06:24:10 -05:00
247a7b2029 Doc: Update pretrained_models wording (#9545)
* Update pretrained_models.rst

To clarify things cf. this tweet for instance https://twitter.com/RTomMcCoy/status/1349094111505211395

* format
2021-01-13 05:58:05 -05:00
69ed36063a fix BlenderbotSmallTokenizer (#9538)
* add model_input_names

* fix test
2021-01-13 10:53:43 +05:30
2df34f4aba [trainer] deepspeed integration (#9211)
* deepspeed integration

* style

* add test

* ds wants to do its own backward

* fp16 assert

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* for clarity extract what args are being passed to deepspeed

* introduce the concept of self.wrapped_model

* s/self.wrapped_model/self.model_wrapped/

* complete transition to self.wrapped_model / self.model

* fix

* doc

* give ds its own init

* add custom overrides, handle bs correctly

* fix test

* clean up model_init logic, fix small bug

* complete fix

* collapse --deepspeed_config into --deepspeed

* style

* start adding doc notes

* style

* implement hf2ds optimizer and scheduler configuration remapping

* oops

* call get_num_training_steps absolutely when needed

* workaround broken auto-formatter

* deepspeed_config arg is no longer needed - fixed in deepspeed master

* use hf's fp16 args in config

* clean

* start on the docs

* rebase cleanup

* finish up --fp16

* clarify the supported stages

* big refactor thanks to discovering deepspeed.init_distributed

* cleanup

* revert fp16 part

* add checkpoint-support

* more init ds into integrations

* extend docs

* cleanup

* unfix docs

* clean up old code

* imports

* move docs

* fix logic

* make it clear which file it's referring to

* document nodes/gpus

* style

* wrong format

* style

* deepspeed handles gradient clipping

* easier to read

* major doc rewrite

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* docs

* switch to AdamW optimizer

* style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* clarify doc

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 19:05:18 -08:00
5f6721032a Use the right version of tokenizers (#9550)
* Use the right version of tokenizers

* Try another way

* Try another way

* Deps are installed from there...

* Deps are installed from there...

* Revert last

* remove needless comment
2021-01-12 18:55:45 -05:00
063d8d27f4 Refactor prepare_seq2seq_batch (#9524)
* Add target contextmanager and rework prepare_seq2seq_batch

* Fix tests, treat BART and Barthez

* Add last tokenizers

* Fix test

* Set src token before calling the superclass

* Remove special behavior for T5

* Remove needless imports

* Remove needless asserts
2021-01-12 18:19:38 -05:00
e6ecef711e Revert, it was not the issue. 2021-01-12 18:00:22 -05:00
250f27f207 Fix tokenizers install for now 2021-01-12 17:50:27 -05:00
dfbf0f5598 topk -> top_k (#9541) 2021-01-12 16:21:29 -05:00
a1100fac67 LayoutLM Config (#9539) 2021-01-12 10:03:50 -05:00
e45eba3b1c Improve LayoutLM (#9476)
* Add LayoutLMForSequenceClassification and integration tests

Improve docs

Add LayoutLM notebook to list of community notebooks

* Make style & quality

* Address comments by @sgugger, @patrickvonplaten and @LysandreJik

* Fix rebase with master

* Reformat in one line

* Improve code examples as requested by @patrickvonplaten

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-12 09:26:32 -05:00
ccd1923f46 [T5] enable T5 fp16 (#9487)
* fix t5 fp16
2021-01-12 17:12:33 +05:30
2aa9c2f204 fix blenderbot tok (#9532) 2021-01-12 05:53:32 -05:00
406cbf58b2 Shouldn't stale issues/PRs with feature request label (#9511) 2021-01-12 04:49:15 -05:00
3b67c5abb0 Update 'Develop on Windows' guidelines (#9519) 2021-01-12 04:15:16 -05:00
a051d8928a [ProphetNet] Fix naming and wrong config (#9514)
* fix naming issues

* better names
2021-01-12 04:10:05 -05:00
7f28613213 [TFBart] Split TF-Bart (#9497)
* make templates ready

* make add_new_model_command_ready

* finish tf bart

* prepare tf mbart

* finish tf bart

* add tf mbart

* add marian

* prep pegasus

* add tf pegasus

* push blenderbot tf

* add blenderbot

* add blenderbot small

* clean-up

* make fix copy

* define blend bot tok

* fix

* up

* make style

* add to docs

* add copy statements

* overwrite changes

* improve

* fix docs

* finish

* fix last slow test

* fix missing git conflict line

* fix blenderbot

* up

* fix blenderbot small

* load changes

* finish copied from

* upload fix
2021-01-12 02:06:32 +01:00
0ecbb69806 [make docs] parallel build (#9522)
After experimenting with different number of workers https://github.com/huggingface/transformers/issues/9496#issuecomment-758145868 4-5 workers seems to be the most optimal - let's go with 4 as surely we wouldn't find a cpu with less cores these days.

Fixes part of https://github.com/huggingface/transformers/issues/9496

@sgugger
2021-01-11 13:00:08 -08:00
e6f211cade [trainer] round numbers in trainer state (#9491)
* round numbers

* style

* round only on logging
2021-01-11 10:17:49 -08:00
01a1684078 Make doc styler behave properly on Windows (#9516) 2021-01-11 10:25:24 -05:00
6009668c63 Add link to forums thread 2021-01-11 10:00:59 -05:00
ba702966ba Fix cardinality (#9505) 2021-01-11 09:42:19 -05:00
33b7422839 [trainer] remove --model_parallel (#9451)
* fix bad merge - dropped code

* remove --model_parallel

* Deal with TrainingArguments

* Use a private attr and fix batch sizes

* fix _n_gpu

* add is_parallel helper wrapper

* fix attribute

* introduce a new attribute is_model_parallel

* docs

* docs

* Put back init False and rearrange doc

* Ignore non-init args in HFArgumentParser

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2021-01-11 09:39:28 -05:00
6f63501383 [doc] How To Request Support document stab (#9288)
* How To Request Support document stab

* integrate suggestions

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small corrections

* expand on how to search for issues with examples

* address issues

* Update ISSUES.md

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* patrick's suggestion

* patrick's suggestion

* small fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-11 09:23:51 -05:00
d20e9c7299 Enable TruncationStrategy override for pipelines (#9432)
* Enable TruncationStrategy override for pipelines

* Update isort.

* Fixing test

* Fixing text_generation pipeline.

* Using same DummyTok as other PR  for easier merge later.

* Some more import guards.

* Remove bogus file.

* Do not pass `generate_kwargs` to `_parse_and_tokenize`.
@patrickvonplaten

* Removed DummyTok.

* Doc quality.
2021-01-11 09:23:28 -05:00
8d25df2c7a Make doc styler detect lists on rst (#9488) 2021-01-11 08:53:41 -05:00
5a442a8db1 New Updated DistilGPT-2 Finetuning and Generation (#9494)
https://github.com/huggingface/transformers/pull/3177
2021-01-11 14:34:39 +01:00
6c8ec2a931 fix tf led pt test (#9513) 2021-01-11 14:14:48 +01:00
1e3c362235 Fix template (#9512) 2021-01-11 08:03:28 -05:00
d415882b41 Remove tolerance + drop_rows_to_fit by default (#9507)
* Remove tolerance + drop_rows_to_fit by default

* remove drop_rows_to_fit
2021-01-11 08:02:41 -05:00
1243ee7d0c Full rework of the TF input/output embeddings and bias resizing (#9193)
* Start rework resizing

* Rework bias/decoder resizing

* Full resizing rework

* Full resizing rework

* Start to update the models with the new approach

* Finish to update the models

* Update all the tests

* Update the template

* Fix tests

* Fix tests

* Test a new approach

* Refactoring

* Refactoring

* Refactoring

* New rework

* Rework BART

* Rework bert+blenderbot

* Rework CTRL

* Rework Distilbert

* Rework DPR

* Rework Electra

* Rework Flaubert

* Rework Funnel

* Rework GPT2

* Rework Longformer

* Rework Lxmert

* Rework marian+mbart

* Rework mobilebert

* Rework mpnet

* Rework openai

* Rework pegasus

* Rework Roberta

* Rework T5

* Rework xlm+xlnet

* Rework template

* Fix TFT5EncoderOnly + DPRs

* Restore previous methods

* Fix Funnel

* Fix CTRL and TransforXL

* Apply style

* Apply Sylvain's comments

* Restore a test in DPR

* Address the comments

* Fix bug

* Apply style

* remove unused import

* Fix test

* Forgot a method

* missing test

* Trigger CI

* naming update

* Rebase

* Trigger CI
2021-01-11 06:27:28 -05:00
cf416764f4 Fix template (#9504) 2021-01-11 05:21:25 -05:00
09926c8e86 fix-template (#9499)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
2021-01-10 20:34:17 -05:00
4f7022d68d Reformat (#9482) 2021-01-10 15:10:15 +01:00
96f1f74aaf Fixing tests. It seems master changed something in the warnings. (#9483)
Trying to keep warning tests for now. Should be discarded if it becomes
too hard to maintain.
2021-01-10 15:08:20 +01:00
1c19b423bf fix(wandb): fix config (#9489) 2021-01-08 14:32:02 -05:00
02e05fb0a5 Making Conversation possible to create directly a full conversation (#9434)
* Cleaning up conversation tests.

* Adding tests that don't require downloading models + conversation can be

fully created from static state.

* Making tests non flaky (by fixing generation length)

* Bumping isort version.

* Doc cleanup.

* Remove unused test in this PR.

* Torch import guard for TF.

* Missing torch guard.

* Small mistake in doc.

* Actual uses `_history` and `_index` cache.

+ remove dead enumerate
+ improve warning message.

* Update src/transformers/pipelines/conversational.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/conversational.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/conversational.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Adding comments and cleaner code to address history copy.

* Improving pipeline name in tests.

* Change tokenizer to a real one (still created at runtime with no

external dependency)

* Simplify DummyTok, reverse changes on tokenization.

* Removing DummyTok.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-08 14:33:25 +01:00
4fbcf8ea49 Fix TF input for np.ndarray (#9294)
* Fix input for np.ndarray"

* add a test

* add a test

* Add a test

* Apply style

* Fix test
2021-01-08 08:23:29 -05:00
e34e45536f Makes HfArgumentParser compatible with Python 3.9 (#9479)
Python 3.9 changed the format of the string serialization of `typing.Optional`.
For example, `str(typing.Optional[str])` is
`typing.Union[str, NoneType]` in python 3.8 and
`typing.Optional[str]` in Python 3.9.
2021-01-08 08:10:44 -05:00
1bdf42409c Fast imports part 3 (#9474)
* New intermediate inits

* Update template

* Avoid importing torch/tf/flax in tokenization unless necessary

* Styling

* Shutup flake8

* Better python version check
2021-01-08 07:40:59 -05:00
79bbcc5260 [Generation] Fix bug for manual decoder_input_ids + warning message (#9472)
* up

* improve style
2021-01-08 05:50:39 -05:00
9e1ea846bc [README] Add new models (#9465)
* add new models

* make fix-copies
2021-01-08 05:49:43 -05:00
bf9056442a Removing duplicated code for Translation,Summarization and Text2TextGeneration pipelines (#9433)
* Merging all duplicated codes for Text2TextPipeline while preserving
backward compat.

* Fixing TranslationPipeline Hierarchy + return_name

* torch import guard.

* Update isort version.

* Remove code from other PR disentanglement.

* Removed named example to something more agnostic.
2021-01-07 23:10:16 +01:00
f33a6f3446 [TFGPT2] - Fix flaky past_key_values test (#9460)
* fix tf flakey

* remove test files
2021-01-07 16:12:08 +01:00
758ed3332b Transformers fast import part 2 (#9446)
* Main init work

* Add version

* Change from absolute to relative imports

* Fix imports

* One more typo

* More typos

* Styling

* Make quality script pass

* Add necessary replace in template

* Fix typos

* Spaces are ignored in replace for some reason

* Forgot one models.

* Fixes for import

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Add documentation

* Styling

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2021-01-07 09:36:14 -05:00
a400fe8931 [LED Test] fix common inputs pt for flaky pt-tf led test (#9459)
* fix common inputs pt flakey led

* fix other tests correspondingly
2021-01-07 12:29:03 +01:00
ae5a32bb0d up (#9454) 2021-01-07 11:51:02 +01:00
812045adcc New serving (#9419)
* Add a serving method

* Add albert

* Add serving for BERT and BART

* Add more models

* Finish the serving addition

* Temp fix

* Restore DPR

* Fix funnel attribute

* Fix attributes GPT2

* Fix OpenAIGPT attribute

* Fix T5 attributes

* Fix Bart attributes

* Fix TransfoXL attributes

* Add versioning

* better test

* Update template

* Fix Flaubert

* Fix T5

* Apply style

* Remove unused imports

* Deactivate extra parameters

* Remove too long test + saved_model default to False

* Ignore the saved model test for some models

* Fix some inputs

* Fix mpnet serving

* Trigger CI

* Address all comments
2021-01-07 11:48:49 +01:00
390cf16bc8 Prophetnet optimization (#9453)
* Vectorized `ngram_attention_bias` calculation

* updated formatting with black

* Further optimization

* one (last) optimization
2021-01-07 11:41:58 +01:00
28d74872cc a more reliable version of branching point discovery (#9449) 2021-01-07 04:47:50 -05:00
3ec40299c1 Remove nested lxmert (#9440) 2021-01-07 04:10:41 -05:00
b8462b5b2a [GenerationOutputs] Fix GenerationOutputs Tests (#9443)
* fix generation models

* fix led

* fix docs

* add is_decoder

* fix last docstrings

* make style

* fix t5 cross attentions

* correct t5
2021-01-06 19:37:02 +01:00
0c96262f7d Fast transformers import part 1 (#9441)
* Don't import libs to check they are available

* Don't import integrations at init

* Add importlib_metdata to deps

* Remove old vars references

* Avoid syntax error

* Adapt testing utils

* Try to appease torchhub

* Add dependency

* Remove more private variables

* Fix typo

* Another typo

* Refine the tf availability test
2021-01-06 12:17:24 -05:00
c89f1bc92e Add flags to return scores, hidden states and / or attention weights in GenerationMixin (#9150)
* Define new output dataclasses for greedy generation

* Add output_[...] flags in greedy generation methods

Added output_attentions, output_hidden_states, output_scores flags in
generate and greedy_search methods in GenerationMixin.

* [WIP] Implement logic and tests for output flags in generation

* Update GreedySearchOutput classes & docstring

* Implement greedy search output accumulation logic

Update greedy_search unittests

Fix generate method return value docstring

Properly init flags with the default config

* Update configuration to add output_scores flag

* Fix test_generation_utils

Sort imports and fix isinstance tests for GreedySearchOutputs

* Fix typo in generation_utils

* Add return_dict_in_generate for backwards compatibility

* Add return_dict_in_generate flag in config

* Fix tyPo in configuration

* Fix handling of attentions and hidden_states flags

* Make style & quality

* first attempt attentions

* some corrections

* improve tests

* special models requires special test

* disable xlm test for now

* clean tests

* fix for tf

* isort

* Add output dataclasses for other generation methods

* Add logic to return dict in sample generation

* Complete test for sample generation

- Pass output_attentions and output_hidden_states flags to encoder in
encoder-decoder models
- Fix import satements order in test_generation_utils file

* Add logic to return dict in sample generation

- Refactor tests to avoid using self.assertTrue, which provides
scarce information when the test fails
- Add tests for the three beam_search methods: vanilla, sample and
grouped

* Style doc

* Fix copy-paste error in generation tests

* Rename logits to scores and refactor

* Refactor group_beam_search for consistency

* make style

* add sequences_scores

* fix all tests

* add docs

* fix beam search finalize test

* correct docstring

* clean some files

* Made suggested changes to the documentation

* Style doc ?

* Style doc using the Python util

* Update src/transformers/generation_utils.py

* fix empty lines

* fix all test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-01-06 17:11:42 +01:00
7a9f1b5c99 Store transformers version info when saving the model (#9421)
* Store transformers version info when saving the model

* Store transformers version info when saving the model

* fix format

* fix format

* fix format

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-06 23:34:48 +08:00
ecfcac223c Improve documentation coverage for Phobert (#9427)
* first commit

* change phobert to phoBERT as per author in overview

* v3 and v4 both runs on same code hence there is no need to differentiate them

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-06 10:04:32 -05:00
be898998bb Improve documentation coverage for Herbert (#9428)
* first commit

* changed XLMTokenizer to HerbertTokenizer in code example
2021-01-06 09:13:43 -05:00
b972c1bfb0 finalize (#9431) 2021-01-06 14:36:55 +01:00
bcb55d33ce Upgrade styler to better handle lists (#9423)
* Add missing lines before a new list.

* Update doc styler and restyle some files.

* Fix docstrings of LED and Longformer
2021-01-06 07:46:17 -05:00
b7e548976f Fix URLs to TAPAS notebooks (#9435) 2021-01-06 07:20:41 -05:00
9f675b05d4 [trainer] self.model_wrapped + _model_unwrap (#9390)
* model wrapped + model_unwrap

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* deprecation warning

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-06 06:50:11 -05:00
453a70d4cb Allow example to use a revision and work with private models (#9407)
* Allow example to use a revision and work with private models

* Copy to other examples and template

* Styling
2021-01-06 06:49:23 -05:00
7988edc031 Fix link to Notebook to fine-tune TAPAS (#9413)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-01-06 03:44:52 -05:00
c9553c0352 Fix link to Evaluate TAPAS Notebook (#9414) 2021-01-06 03:42:50 -05:00
090d28e32d [Refactor] Splitting pipelines.py into its own module. (#9279)
* Splitting pipelines into its own module.

* Moving everything into base.py

* Moving FeatureExtractionPipeline into its own file.

* TextGenerationPipeline.

* TextClassifictionPipeline

* ZeroShot + get_framework import.

* FillMaskPipeline

* NerPipeline + TokenClassificationPipeline

* QuestionAnsweringPipeline

* TableQuestionAnsweringPipeline

* ConversationnalPipeline

* Text2TextGenerationPipeline, TranslationPipeline, SummarizationPipeline

* Typo import fix.

* Relative imports.
2021-01-06 09:33:50 +01:00
d64372fdfc [docs] outline sharded ddp doc (#9208)
* outline sharded dpp doc

* fix link

* add example

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* narrow the command and remove non-essentials

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-05 17:34:15 -08:00
eef66035a2 [PyTorch Bart] Split Bart into different models (#9343)
* first try

* remove old template

* finish bart

* finish mbart

* delete unnecessary line

* init pegasus

* save intermediate

* correct pegasus

* finish pegasus

* remove cookie cutter leftover

* add marian

* finish blenderbot

* replace in file

* correctly split blenderbot

* delete "old" folder

* correct "add statement"

* adapt config for tf comp

* correct configs for tf

* remove ipdb

* fix more stuff

* fix mbart

* push pegasus fix

* fix mbart

* more fixes

* fix research projects code

* finish docs for bart, mbart, and marian

* delete unnecessary file

* correct attn typo

* correct configs

* remove pegasus for seq class

* correct peg docs

* correct peg docs

* finish configs

* further improve docs

* add copied from statements to mbart

* fix copied from in mbart

* add copy statements to marian

* add copied from to marian

* add pegasus copied from

* finish pegasus

* finish copied from

* Apply suggestions from code review

* make style

* backward comp blenderbot

* apply lysandres and sylvains suggestions

* apply suggestions

* push last fixes

* fix docs

* fix tok tests

* fix imports code style

* fix doc
2021-01-05 22:00:05 +01:00
4eec5d0cf6 improve readme text to private models/versioning/api (#9424) 2021-01-05 15:02:46 -05:00
d9e848c1d6 add experimental warning (#9412) 2021-01-05 10:05:32 -05:00
29acabd886 [trainer] group fp16 args together (#9409)
* [t5 doc] typos

a few run away backticks

@sgugger

* style

* [trainer] put fp16 args together

this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read

@sgugger

* style
2021-01-05 09:39:38 -05:00
57a6626929 [examples/text-classification] Fix a bug for using one's own dataset of a regression task (#9411) 2021-01-05 08:15:06 -05:00
189387e9b2 LED (#9278)
* create model

* add integration

* save current state

* make integration tests pass

* add one more test

* add explanation to tests

* remove from bart

* add padding

* remove unnecessary test

* make all tests pass

* re-add cookie cutter tests

* finish PyTorch

* fix attention test

* Update tests/test_modeling_common.py

* revert change

* remove unused file

* add string to doc

* save intermediate

* make tf integration tests pass

* finish tf

* fix doc

* fix docs again

* add led to doctree

* add to auto tokenizer

* added tips for led

* make style

* apply jplus statements

* correct tf longformer

* apply lysandres suggestions

* apply sylvains suggestions

* Apply suggestions from code review
2021-01-05 13:14:30 +01:00
314cca2842 Fix documentation links always pointing to master. (#9217)
* Use extlinks to point hyperlink with the version of code

* Point to version on release and master until then

* Apply style

* Correct links

* Add missing backtick

* Simple missing backtick after all.

Co-authored-by: Raghavendra Sugeeth P S <raghav-5305@raghav-5305.csez.zohocorpin.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-01-05 06:18:48 -05:00
52d62e686c Fix TF Funnel (#9300)
* Fix Funnel

* Apply Patrick's comment

* Remove comment

* Fix dummy value

* Apply style
2021-01-05 05:54:50 -05:00
748006c0b3 [trainer] --model_parallel hasn't been implemented for most models (#9347)
* --model_parallel hasn't been implemented for most models

* make the help clear as well

* implement is_parallelizable; use it

* oops

* remove property
2021-01-05 04:01:30 -05:00
4225740a7b Use stable functions (#9369) 2021-01-05 03:58:26 -05:00
4aa8f6ad99 [logging] autoflush (#9385)
This PR proposes to:

* auto-flush `transformers` logging 

When using logging for tracing signals from different parts of the code and which could be mixed with print debug this aids to get all the logging events synchronized. 

I don't think this change will introduce any performance impacts.

If it helps someone here is the code I used to sync `transformers` logging with various other debug prints.

I was porting bart to MP and I needed to trace that the device switching happens correctly and I added a bunch of logger.info calls inside `modeling_bart.py` and also had some other helpers `print` debug messages which weren't logger based:

```

# auto flush std streams
from sys import stdout, stderr
def stdout_write_flush(args, w=stderr.write): w(args); stderr.flush()
def stderr_write_flush(args, w=stderr.write): w(args); stderr.flush()
stdout.write = stdout_write_flush
stderr.write = stderr_write_flush

from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

import logging
import transformers.utils.logging
import transformers.models.bart.modeling_bart

# I wanted a shorter simpler format
handlers = transformers.utils.logging._get_library_root_logger().handlers
for handler in handlers:
    formatter = logging.Formatter("[%(funcName)s] %(message)s")
    handler.setFormatter(formatter)

transformers.models.bart.modeling_bart.logger.setLevel(transformers.logging.INFO)
```

@LysandreJik, @sgugger, @patrickvonplaten
2021-01-05 03:57:57 -05:00
83eec97ec6 Fix TF Longformer (#9348)
* Fix longformer

* Apply style

* Remove serving content

* Forgot a condition

* Apply style

* Address Patrick's comments

* Fix dtype
2021-01-05 03:49:54 -05:00
30fa0b780f feat(wandb): save model as artifact (#8119)
* feat(wandb): log artifacts

* fix: typo

* feat(wandb): ensure name is allowed

* feat(wandb): log artifact

* feat(wandb): saving logic

* style: improve formatting

* fix: unrelated typo

* feat: use a fake trainer

* fix: simplify

* feat(wandb): log model files as artifact

* style: fix style

* docs(wandb): correct description

* feat: unpack model + allow env Truethy values

* feat: TrainerCallback can access tokenizer

* style: fix style

* feat(wandb): log more interesting metadata

* feat: unpack tokenizer

* feat(wandb): metadata with load_best_model_at_end

* feat(wandb): more robust metadata

* style(wandb): fix formatting
2021-01-05 03:30:46 -05:00
143289dcf7 [test_model_parallelization] multiple fixes (#9354) 2021-01-04 12:09:12 -08:00
086718ac6e Improve documentation coverage for Bertweet (#9379)
* bertweet docs coverage

* style doc max len 119

* maxlen style rst

* run main() from style_doc

* changed according to  comments
2021-01-04 13:12:59 -05:00
47ca0eaaac replace apex.normalization.FusedLayerNorm with torch.nn.LayerNorm (#9386) 2021-01-04 19:00:08 +01:00
75ff530551 correct docs (#9378) 2021-01-04 17:27:29 +01:00
ec54d70e16 Fix TF DPR (#9283)
* Fix DPR

* Keep usual models

* Apply style

* Address Sylvain's comments
2021-01-04 17:26:56 +01:00
de29ff9bd2 Fix open (#9368) 2021-01-04 10:22:15 -05:00
d018afced0 [trainer] parametrize default output_dir (#9352)
This PR:

* fixes trainer to have the logger agree with the actual default `output_dir`, but setting it one place and passing it as an argument to both places

@sgugger
2021-01-04 10:14:32 -05:00
d735b074d7 Fix Flaubert (#9292) 2021-01-04 16:06:28 +01:00
5dd389d1c7 Bump notebook from 6.1.4 to 6.1.5 in /examples/research_projects/lxmert (#9402)
Bumps [notebook](https://github.com/jupyter/jupyterhub) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases)
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md)
- [Commits](https://github.com/jupyter/jupyterhub/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-04 10:02:07 -05:00
23a71449c0 Put back LXMert example (#9401) 2021-01-04 09:59:07 -05:00
6c03d4ac70 Fix CTRL (#9291) 2021-01-04 09:56:51 -05:00
c581d8af7a Add utility function for retrieving locally cached models (#8836)
* add get_cached_models function

* add List type to import

* fix code quality

* Update src/transformers/file_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/file_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-04 09:53:54 -05:00
8eb7f26d5d simplify marian distillation script (#9394) 2021-01-04 11:21:24 +05:30
d944966b19 Fix typos in README and bugs in RAG example code for end-to-end evaluation and finetuning (#9355)
* fix a bug in eval_batch_retrieval

* should return parser as well as other staticmethod

* remove duplicate argument

* these kwargs are no longer accepted (cause TypeError in self.generator.generate of modeling_rag.py)

* fixed file paths in README

* moved an arg to add_ray_specific_args
2021-01-03 16:00:30 +01:00
c4fd609afb file_utils.py: TF examples outputs.last_hidden_states -> state (#9382) 2021-01-02 17:58:16 +01:00
b01f451ca3 [Docs] past_key_values return a tuple of tuple as a default (#9381)
* push

* make style
2021-01-02 15:55:07 +01:00
5f7a07c0c8 use return dict for rag encoder (#9363) 2021-01-02 12:39:14 +01:00
ae333d04b2 torch.cuda.is_available() is redundant as apex handles that internally (#9350) 2020-12-30 10:09:51 +01:00
8217d4e37f [prophetnet] wrong import (#9349)
```
python -c "from apex.normalization import FusedProphetNetLayerNorm"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: cannot import name 'FusedProphetNetLayerNorm' from 'apex.normalization' (/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/apex/normalization/__init__.py)
```
It looks like this code has never been tested, so it silently fails inside try/except.

Discovered this by accident in https://github.com/huggingface/transformers/issues/9338#issuecomment-752217708
2020-12-29 22:32:07 +01:00
912f6881d2 add import math (#9346) 2020-12-29 19:35:06 +01:00
785e52cd30 improve templates (#9342) 2020-12-29 16:48:44 +01:00
64103fb6be Fix TransfoXL (#9302) 2020-12-28 20:52:18 +01:00
d97d06d05f Fix TF T5 (#9301)
* Fix T5

* Fix test

* Fix test
2020-12-28 20:51:40 +01:00
83fdd252f6 [Seq2Seq Templates] Correct some TF-serving errors and add gradient checkpointing to PT by default. (#9334)
* correct tests

* correct shape and get_tf_activation

* more correction tf

* add gradient checkpointing to templates

* correct typo
2020-12-28 17:51:04 +01:00
8e74eca7f2 push (#9320) 2020-12-27 21:57:50 +01:00
61443cd7d9 [GPT2] Correct gradient checkpointing (#9308)
* correct gpt2

* fix gpt2

* fix use_cache ordering

* correct past tolerance

* fix for all cases

* style
2020-12-25 23:28:12 +01:00
21fc676645 add translation example (#9303)
* Created using Colaboratory

* mbart-training examples add

* link add

* Update description

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2020-12-25 14:47:49 +05:30
52b3a05e83 [Bart doc] Fix outdated statement (#9299)
* fix bart doc

* fix docs
2020-12-24 14:47:53 +01:00
7777db159f Update tokenization_utils_base.py (#9293)
Missing "s" typo
2020-12-24 14:43:14 +01:00
71963a6633 fix typo in modeling_encoder_decoder.py (#9297)
* Update modeling_encoder_decoder.py

Fixed typo.

* typo

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2020-12-24 14:38:08 +01:00
f3a3b91d6f Proposed Fix : [RagSequenceForGeneration] generate "without" input_ids (#9220)
* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

* fix RagSeq generate with context_input_ids

fix RagSeq generate with context_input_ids

* apply style

* delete unused lines

* Add test_rag_sequence_generate_batch_from_context_input_ids

* Readability improved

* stylying

* Stylize

* typos

* add check_model_generate_from_context_input_ids

* make style

* Apply suggestions from code review

* make style2

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
2020-12-24 13:38:00 +01:00
2a18b70998 enable cache by default (#9296) 2020-12-24 17:47:36 +05:30
6189ae9960 Fix typo in file_utils.py (#9289) 2020-12-24 13:48:33 +05:30
222dbdb203 allow integer device for BatchEncoding (#9271)
Fixes #9244

Co-authored-by: Jethro Kuan <jethro.kuan@bytedance.com>
2020-12-24 09:01:56 +01:00
6c091abef2 [Templates] Adapt Bert (#9284)
* adapt templates

* adapt config

* add test as well

* fix output type

* fix cache false naming

* finish tests

* last fix
2020-12-24 01:44:33 +01:00
88ef8893cd Add caching mechanism to BERT, RoBERTa (#9183)
* add past_key_values

* add use_cache option

* make mask before cutting ids

* adjust position_ids according to past_key_values

* flatten past_key_values

* fix positional embeds

* fix _reorder_cache

* set use_cache to false when not decoder, fix attention mask init

* add test for caching

* add past_key_values for Roberta

* fix position embeds

* add caching test for roberta

* add doc

* make style

* doc, fix attention mask, test

* small fixes

* adress patrick's comments

* input_ids shouldn't start with pad token

* use_cache only when decoder

* make consistent with bert

* make copies consistent

* add use_cache to encoder

* add past_key_values to tapas attention

* apply suggestions from code review

* make coppies consistent

* add attn mask in tests

* remove copied from longformer

* apply suggestions from code review

* fix bart test

* nit

* simplify model outputs

* fix doc

* fix output ordering
2020-12-23 23:01:32 +05:30
a1cb6e9866 Adapt to new name of label_smoothing_factor training arg (#9282) 2020-12-23 11:05:21 -05:00
bcc87c639f Minor documentation revisions from copyediting (#9266)
* typo: Revise "checkout" to "check out"

* typo: Change "seemlessly" to "seamlessly"

* typo: Close parentheses in "Using the tokenizer"

* typo: Add closing parenthesis to supported models aside

* docs: Treat ``position_ids`` as plural

Alternatively, the word "argument" could be added to make the subject singular.

* docs: Remove comma, making subordinate clause

* docs: Remove comma separating verb and direct object

* docs: Fix typo ("next" -> "text")

* docs: Reverse phrase order to simplify sentence

* docs: "quicktour" -> "quick tour"

* docs: "to throw" -> "from throwing"

* docs: Remove disruptive newline in padding/truncation section

* docs: "show exemplary" -> "show examples of"

* docs: "much harder as" -> "much harder than"

* docs: Fix typo "seach" -> "search"

* docs: Fix subject-verb disagreement in WordPiece description

* docs: Fix style in preprocessing.rst
2020-12-23 10:15:49 -05:00
d5db6c37d4 [Seq2Seq Templates] Fix check_repo.py templates file (#9277)
* add enc dec pt model to check repo

* fix indent
2020-12-23 11:40:20 +01:00
4bafc43b0e Fix param error (#9273)
TypeError: forward() got an unexpected keyword argument 'token_type_ids'
2020-12-23 11:34:57 +01:00
58e8a7611f Fix gpt2 document (#9272) 2020-12-23 11:34:15 +01:00
cbe63949d7 Model Templates for Seq2Seq (#9251)
* adapt cookie cutter

* fix copy past statement

* delete copy statements for now

* remove unused import from template

* make doc rst

* correct config docstring

* correct training

* correct inputs processing tf enc dec

* make style

* adapt templates

* clean tabs

* correct tensor -> Tensor naming

* correct indent

* correct templates

* fix the test

* break lines to avoid > 119

* Apply suggestions from code review
2020-12-22 23:41:20 +01:00
e6c1f1cad8 Revert renaming in finetune_trainer (#9262) 2020-12-22 15:42:34 -05:00
ab17758874 Add speed metrics to all example scripts + template (#9260) 2020-12-22 14:02:26 -05:00
5b5f7dd09c [hf_api] Fix incorrect typing 2020-12-22 19:52:47 +01:00
1558d191e6 Fix TF BART for saved model creation (#9252)
* Fix TF BART for saved model creation

* Apply style

* Update src/transformers/models/bart/modeling_tf_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_tf_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Rework the fix

* Fix condition

* Apply style

* Fix condition

* Fix shape_list

* Apply Patrick's solution

* Apply Patrick's solution

* Rebase

* make tests pass

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-12-22 18:07:04 +01:00
37d6fb5d04 Fix link to bertabs/README.md (#9255) 2020-12-22 11:41:23 -05:00
189c1b91a6 Fix link to old language modeling script (#9254) 2020-12-22 11:40:47 -05:00
490b39e614 Seq2seq trainer (#9241)
* Add label smoothing in Trainer

* Add options for scheduler and Adafactor in Trainer

* Put Seq2SeqTrainer in the main lib

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments and adapt scripts

* Documentation

* Move test not using script to tests folder

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-22 11:33:44 -05:00
1fc7119181 Fix script that check objects are documented (#9259) 2020-12-22 11:12:58 -05:00
e9d77ccd5a [EncoderDecoder] Make tests more aggressive (#9256)
* add tests

* make style and fix bart bug

* fix bart past key value edge case

* correct tf bart test

* fix gpt2 tf

* fix t5 test
2020-12-22 17:00:04 +01:00
ec07da65e2 Update the README of the text classification example (#9237)
* Update the README of the text classification example

* Update examples/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adapt comment from review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-21 15:23:40 -05:00
4eef5889ac Adding performer fine-tuning research exampke (#9239)
* added run_mlm_performer.py research example

* make styke

* make styke

* Added a README !
2020-12-21 21:19:41 +01:00
9a12b9696f [MPNet] Add slow to fast tokenizer converter (#9233)
* add converter

* delet unnecessary comments
2020-12-21 15:41:34 +01:00
f4432b7e01 add base model classes to bart subclassed models (#9230)
* add base model classes to  bart subclassed models

* add doc
2020-12-21 19:56:46 +05:30
08abdabda1 Fixed beam search generation for GPT2 and T5 (#9219) 2020-12-21 08:05:23 -05:00
161a6461db Fix TF template (#9234) 2020-12-21 13:52:16 +01:00
5a8a4eb187 Improve BERT-like models performance with better self attention (#9124)
* Improve BERT-like models attention layers

* Apply style

* Put back error raising instead of assert

* Update template

* Fix copies

* Apply raising valueerror in MPNet

* Restore the copy check for the Intermediate layer in Longformer

* Update longformer
2020-12-21 13:10:15 +01:00
6b034309ca fix warning (#9231) 2020-12-21 10:41:34 +01:00
a4b21cdd20 [RAG] Add Ray implementation for distributed retrieval (#9197)
* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* uncomment

* uncomment

* wip

* updates

* add docstring

* updates

* fix arg

* fixes

* add unit tests

* update readme

* update readme

* update finetune script

* update test

* add test

* add ray to test dependencies

* separate ray and ray tune

* formatting

* shutdown ray at end of test

* fix tests

* formatting

* formatting

* even more formatting

* address comments

* formatting

* add files

* Update examples/research_projects/rag/test_distributed_retriever.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* address comments

* addressing comments

Co-authored-by: Ubuntu <ubuntu@ip-172-31-21-208.us-west-2.compute.internal>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-21 10:39:30 +01:00
f38c4ad302 better logging and help (#9203) 2020-12-20 10:28:28 -08:00
e0e255be1f Added TF TransfoXL Sequence Classification (#9169)
* TF Transfoxl seq classification

* Update test_modeling_tf_transfo_xl.py

Added num_labels to config level

* TF Transfoxl seq classification

* Update test_modeling_tf_transfo_xl.py

Added num_labels to config level

* code refactor

* code refactor

* code refator
2020-12-19 14:44:04 +01:00
6b850b671d [run_glue] add speed metrics (#9198)
* add speed metrics

* suggestions
2020-12-18 17:09:30 -08:00
3ff5e8955a [t5 doc] typos (#9199)
* [t5 doc] typos

a few run away backticks

@sgugger

* style
2020-12-18 16:03:26 -08:00
291974c65c GPT-model attention heads pruning example (#9189)
* Pruning for GPT attn heads

* The code formatted according to the transformers requirements

* Update run_prune_gpt.py

* Update run_prune_gpt.py
2020-12-18 16:32:10 -05:00
1198ba8fba Add timing inside Trainer (#9196)
* Add timing inside Trainer

* Fix tests

* Add n_objs for train

* Sort logs
2020-12-18 15:10:39 -05:00
9a25c5bd3a Add new run_swag example (#9175)
* Add new run_swag example

* Add check

* Add sample

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Very important change to make Lysandre happy

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-12-18 14:19:24 -05:00
3e56e2ce04 Fix typo 2020-12-18 10:11:07 -05:00
077a5dce32 Fix link to old SQUAD fine-tuning script (#9181) 2020-12-18 09:12:10 -05:00
84d5879eaf [setup] correct transformers version format (#9176)
setuptools has a pretty fixed expectation of version numbers.

This PR fixes the dev version number and adds a comment with correct formats for the future editors

This fix removes this warning on `make fixup|style|etc` or any other time `setup.py` is being run.
```
setuptools/dist.py:452: UserWarning: Normalizing '4.2.0dev0' to '4.2.0.dev0'
  warnings.warn(tmpl.format(**locals()))
```
and the alternative:
```
/setuptools/dist.py:452: UserWarning: Normalizing '4.0.0-rc-1' to '4.0.0rc1
```

Fixes: #8749

@LysandreJik, @sgugger
2020-12-18 08:55:55 -05:00
fd7b6a5274 fixed JSON error in run_qa with fp16 (#9186) 2020-12-18 07:53:23 -05:00
66a14a2f6f Fix link to old NER fine-tuning script (#9182) 2020-12-17 19:50:01 -05:00
f06d0fadc9 [trainer] apex fixes and tests (#9180) 2020-12-17 16:49:11 -08:00
467e9158b4 Added TF CTRL Sequence Classification (#9151)
* Added TF CTRL Sequence Classification

* code refactor
2020-12-17 18:10:57 -05:00
63841c559b add tests for the new sharded ddp fairscale integration (#9177) 2020-12-17 14:24:03 -08:00
bf713cdec7 setup.py development version 2020-12-17 11:29:31 -05:00
bd40345d3e v4.1.1 docs 2020-12-17 11:28:38 -05:00
bfa4ccf77d Release: v4.1.1 2020-12-17 11:25:49 -05:00
e0790cca78 Fix TAPAS doc 2020-12-17 11:25:05 -05:00
6d2e864db7 Put all models in the constants (#9170)
* Put all models in the constants

* Add Google AI mention in the main README
2020-12-17 11:23:21 -05:00
f83d9c8da7 v4.1.0 docs 2020-12-17 10:16:07 -05:00
f5438ab8a2 Release: v4.1.0 2020-12-17 10:04:55 -05:00
ac2c7e398f Remove erroneous character 2020-12-17 09:47:19 -05:00
77d6941e64 Fix gradient clipping for Sharded DDP (#9168)
* Fix gradient clipping for Sharded DDP

* Fix typos in comments
2020-12-17 09:44:24 -05:00
1aca3d6afa Add disclaimer to TAPAS rst file (#9167)
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-12-17 09:34:06 -05:00
dc9f245442 Torch scatter with torch 1.7.0 2020-12-16 13:48:57 -05:00
9a67185344 Experimental support for fairscale ShardedDDP (#9139)
* Experimental stupport for fairscale ShardedDDP

* Add import error if fairscale not available

* Address review comments

* Fix seq2seq trainer
2020-12-16 13:47:48 -05:00
1c1a2ffbff TableQuestionAnsweringPipeline (#9145)
* AutoModelForTableQuestionAnswering

* TableQuestionAnsweringPipeline

* Apply suggestions from Patrick's code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Sylvain and Patrick comments

* Better PyTorch/TF error message

* Add integration tests

* Argument Handler naming

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Fix docs to appease the documentation gods

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-16 12:31:50 -05:00
07384baf7a AutoModelForTableQuestionAnswering (#9154)
* AutoModelForTableQuestionAnswering

* Update src/transformers/models/auto/modeling_auto.py

* Style
2020-12-16 12:14:33 -05:00
34334662df Add message to documentation that longformer doesn't support token_type_ids (#9152)
* Add message to documentation that longformer doesn't support token_type_ids

* Format changes
2020-12-16 11:06:14 -05:00
2f918defa8 hotfix torch scatter version 2020-12-16 10:26:13 -05:00
4d48973523 Update notebook table and transformers intro notebook (#9136) 2020-12-16 10:24:31 -05:00
fb650df859 Support for private models from huggingface.co (#9141)
* minor wording tweaks

* Create private model repo + exist_ok flag

* file_utils: `use_auth_token`

* Update src/transformers/file_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Propagate doc from @sgugger

Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-16 10:09:57 -05:00
c69d19faa8 DistilBertForSequenceClassification (#9148)
fix small shape error in comments
2020-12-16 09:21:42 -05:00
640e6fe190 [Flax] Align FlaxBertForMaskedLM with BertForMaskedLM, implement from_pretrained, init (#9054)
* save intermediate

* save intermediate

* save intermediate

* correct flax bert model file

* new module / model naming

* make style

* almost finish BERT

* finish roberta

* make fix-copies

* delete keys file

* last refactor

* fixes in run_mlm_flax.py

* remove pooled from run_mlm_flax.py`

* fix gelu | gelu_new

* remove Module from inits

* splits

* dirty print

* preventing warmup_steps == 0

* smaller splits

* make fix-copies

* dirty print

* dirty print

* initial_evaluation argument

* declaration order fix

* proper model initialization/loading

* proper initialization

* run_mlm_flax improvements: improper model inputs bugfix + automatic dataset splitting + tokenizers parallelism warning + avoiding warmup_steps=0 bug

* removed tokenizers warning hack, fixed model re-initialization

* reverted training_args.py changes

* fix flax from pretrained

* improve test in flax

* apply sylvains tips

* update init

* make 0.3.0 compatible

* revert tevens changes

* revert tevens changes 2

* finalize revert

* fix bug

* add docs

* add pretrained to init

* Update src/transformers/modeling_flax_utils.py

* fix copies

* final improvements

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-12-16 13:03:32 +01:00
51adb97cd6 Fix fp16_backend field 2020-12-15 17:14:37 -05:00
1551e2dc6d [WIP] Tapas v4 (tres) (#9117)
* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-12-15 17:08:49 -05:00
ad895af98d Add possibility to switch between APEX and AMP in Trainer (#9137)
* Add possibility to switch between APEX and AMP in Trainer

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Address review comments

* Update src/transformers/training_args.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2020-12-15 16:38:10 -05:00
0b2f46fa9e Add large model config (#9140) 2020-12-15 16:03:59 -05:00
2a7e8e1608 [Examples] Add automatic dataset splitting in language-modeling examples (#9133)
* replaced jnp.split + removing textual model inputs + ensuring warmup_steps > 0

* Add automatic dataset splitting in language-modeling examples
2020-12-15 16:02:43 -05:00
e771749777 Fix add order (#9129) 2020-12-15 15:16:56 -05:00
18ecd36f65 Fix Bart Shift (#9135)
* correct mistake in order

* fix tensor copy

* clone tensor correctly
2020-12-15 19:04:31 +01:00
d018622d8e correct mistake in order (#9134) 2020-12-15 23:08:31 +05:30
80bdb9c31a fix bart loss masking (#9131) 2020-12-15 18:17:17 +01:00
3caba8d35f Fix typo in trainer_tf.py (#9132) 2020-12-15 12:12:28 -05:00
abc573f51a [TF Bart] Refactor TFBart (#9029)
* reorder file

* delete unnecesarry function

* make style

* save intermediate

* fix attention masks

* correct tf bart past key values

* solve merge conflict bug

* correct tensor dims

* save intermediate tf

* change attn layer

* fix typo re-order past

* inputs_embeds

* make fix copies

* finish tests

* fix graph mode

* appyl lysandres suggestions
2020-12-15 17:31:28 +01:00
389aba34bf Added TF OpenAi GPT1 Sequence Classification (#9105)
* TF OpenAI GPT Sequence Classification

* Update src/transformers/models/openai/modeling_tf_openai.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-12-15 11:27:08 -05:00
ef2d4cd445 Fix tf2.4 (#9120)
* Fix tests for TF 2.4

* Remove <2.4 limitation

* Add version condition

* Update tests/test_optimization_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_optimization_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_optimization_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-15 10:10:46 -05:00
6ccea0486f Fix T5 model parallel tes (#9107)
k
2020-12-15 09:51:12 -05:00
59da3f2700 Fix stack overflow (#9114) 2020-12-15 09:15:49 -05:00
14c79c3e31 native amp leak fix landed in 1.7.1 (#9115)
update README with good news that the leak fix has been applied to pytorch-1.7.1.
2020-12-15 09:10:41 -05:00
ed1845ef4c Clarify use of TrainingArguments.disable_tqdm in Jupyter Notebooks (#9076)
* Clarify impact of disable_tqdm on Jupyter Notebooks

* Add weblink to argparse

* Replace "dev set" with more common "validation set" in do_eval

* Tweak prediction_loss_only

* Tweak description of Adam hyperparameters

* Add weblink to TensorBoard

* Capitalise apex

* Tweak local_rank description

* Add weblink for wandb

* Replace nlp with datasets

* Tweak grammar in model_parallel

* Capitalise apex

* Update TensorFlow training args to match PyTorch ones

* Fix style

* Fix underscore in weblink

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix underscore in weblink

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add obj to datasets.Dataset

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-15 09:00:19 -05:00
44c340f45f fix a bug in eval_batch_retrieval (#9089) 2020-12-15 14:46:55 +01:00
c19d04623e [finetune_trainer] enhancements and fixes (#9042)
* trainer and finetune_trainer enhancements and fixes

* add fallback default

* move the fixing of incorrect keys back into finetune trainer

* s/eval/val/ to match the split

* trainer can now use a different prefix than eval_ for metrics

* document new arg

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* use 'eval' as the default for metric_key_prefix

* complete adjust var names + disambiguate

* fix logger

* add clarifying comment

* add clarifying comment

* style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/trainer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete removal of optional for metric_key_prefix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-14 17:45:33 -08:00
251eb70c97 Also pin TF CPU 2020-12-14 16:17:04 -05:00
e4ef57a9bb Pin TF to < 2.4 2020-12-14 16:06:30 -05:00
df3f4d2aef Fix T5 and BART for TF (#9063)
* Fix T5 for graphe compilation+execution

* Fix BART

* Fix import

* Fix naming

* fix attribute name

* Oops

* fix import

* fix tests

* fix tests

* Update test

* Add mising import

* Address Patrick's comments

* Style

* Address Patrick's comment
2020-12-14 18:47:00 +01:00
a9c8bff724 Add parallelization support for T5EncoderModel (#9082)
* add model parallelism to T5EncoderModel

add model parallelism to T5EncoderModel

* remove decoder from T5EncoderModel parallelize

* uodate T5EncoderModel docs

* Extend T5ModelTest for T5EncoderModel

* fix T5Stask using range for get_device_map

* fix style

Co-authored-by: Ahmed Elnaggar <elnaggar@rostlab.informatik.tu-muenchen.de>
2020-12-14 12:00:45 -05:00
b00eb4fb02 Testing Experimental CI Features (#9070) 2020-12-14 10:34:59 -05:00
74daf1f954 Fixed a broken link in documentation (#9101) 2020-12-14 09:12:27 -05:00
d6af344c9e correct var name in TrainingArguments docstring (#9096) 2020-12-14 09:02:54 -05:00
fa1ddced9e [RAG, Bart] Align RAG, Bart cache with T5 and other models of transformers (#9098)
* fix rag

* fix slow test

* fix past in bart
2020-12-14 12:32:26 +01:00
6587cf9f84 Patch *ForCausalLM model (#9092) 2020-12-14 00:39:55 -05:00
51d9c569fa Fix embeddings resizing in TF models (#8657)
* Resize the biases in same time than the embeddings

* Trigger CI

* Biases are not reset anymore

* Remove get_output_embeddings + better LM model detection in generation utils

* Apply style

* First test on BERT

* Update docstring + new name

* Apply the new resizing logic to all the models

* fix tests

* Apply style

* Update the template

* Fix naming

* Fix naming

* Apply style

* Apply style

* Remove unused import

* Revert get_output_embeddings

* Trigger CI

* Update num parameters

* Restore get_output_embeddings in TFPretrainedModel and add comments

* Style

* Add decoder resizing

* Style

* Fix tests

* Separate bias and decoder resize

* Fix tests

* Fix tests

* Apply style

* Add bias resizing in MPNet

* Trigger CI

* Apply style
2020-12-13 23:05:24 -05:00
3552d0e0d8 [model_cards] Migrate cards from this repo to model repos on huggingface.co (#9013)
* rm all model cards

* Update the .rst

@sgugger it is still not super crystal clear/streamlined so let me know if any ideas to make it simpler

* Add a rootlevel README.md with simple instructions/context

* Update docs/source/model_sharing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

* rm all model cards

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-11 18:24:42 -05:00
29e4597950 Fix min_null_pred in the run_qa script (#9067) 2020-12-11 16:26:05 -05:00
9cc9f4122e Make ProphetNetModel really compatible with EncoderDecoder (#9033)
* improve

* finish

* upload model

* fix lm head

* fix test
2020-12-11 16:59:54 +01:00
24f6cdeab6 Bump notebook in /examples/research_projects/movement-pruning/lxmert (#9062)
Bumps [notebook](https://github.com/jupyter/jupyterhub) from 6.1.4 to 6.1.5.
- [Release notes](https://github.com/jupyter/jupyterhub/releases)
- [Changelog](https://github.com/jupyterhub/jupyterhub/blob/master/CHECKLIST-Release.md)
- [Commits](https://github.com/jupyter/jupyterhub/commits)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-12-11 10:32:43 -05:00
91fa707217 Remove docs only check (#9065) 2020-12-11 10:27:31 -05:00
70527ba694 Fix PreTrainedTokenizer.pad when first inputs are empty (#9018)
* Fix PreTrainedTokenizer.pad when first inputs are empty

* Handle empty inputs case
2020-12-11 10:25:00 -05:00
783d7d2629 Reorganize examples (#9010)
* Reorganize example folder

* Continue reorganization

* Change requirements for tests

* Final cleanup

* Finish regroup with tests all passing

* Copyright

* Requirements and readme

* Make a full link for the documentation

* Address review comments

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add symlink

* Reorg again

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Adapt title

* Update to new strucutre

* Remove test

* Update READMEs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-12-11 10:07:02 -05:00
86896de064 update tatoeba workflow (#9051) 2020-12-11 20:29:15 +05:30
7c8f5f6487 Create README.md (#8096)
* Create README.md

* Fix model card

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2020-12-11 09:45:12 -05:00
5527f78721 Create README.md (#8281)
* Create README.md

* Update model_cards/kiri-ai/distiluse-base-multilingual-cased-et/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-12-11 09:41:29 -05:00
c615df7422 Create README.md (#8751)
* Create README.md

* Update model_cards/Cinnamon/electra-small-japanese-generator/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-12-11 09:40:14 -05:00
76df559383 QARiB Arabic and dialects models (#8796)
* Add QARiB models

* fix README.md

* Fix README.md

* Fix README.md

* Fix README.md

* Fix QARiB files

* add models card for QARiB models 860k, 1790k, and 1970k

* try to fix PR

* re-add files

* links aren't allowed here :)

Co-authored-by: Ahmed Abdelali <aabdelali@hbku.edu.qa>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
2020-12-11 09:38:38 -05:00
b161f1ae54 Update README.md (#8820) 2020-12-11 09:24:21 -05:00
649d389dab Initial README for t5-base-indonesian-summarization-cased model (#9028)
* Create README.md

Initial README for `t5-base-indonesian-summarization-cased` model

* Update README for t5-base-indonesian-summarization-cased

Typo in README, change from `small` to `base`
2020-12-11 09:18:16 -05:00
5e794b6628 Create README.md (#9030)
Initial README for `t5-small-indonesian-summarization-cased` model
2020-12-11 09:17:29 -05:00
935e346959 🎨 Change nn.dropout to layer.Dropout (#9047) 2020-12-11 10:40:25 +01:00
b01ddc9577 Remove value error (#8985)
* Remove value error

* Try a fix for parameter ordering

* Restore previous behavior

* Add documentation

* Review the comment
2020-12-10 17:17:19 -05:00
91ab02af28 Fix typo #9012 (#1) (#9038)
There is a tiny typo in the code "transformers/examples/language-modeling/run_mlm_wwm.py" at line 284. [Details.](https://github.com/huggingface/transformers/issues/9012)
2020-12-10 16:41:00 -05:00
8d4bb02056 Refactor FLAX tests (#9034) 2020-12-10 15:57:39 -05:00
1310e1a758 Enforce all objects in the main init are documented (#9014) 2020-12-10 11:57:12 -05:00
51e81e5895 MPNet copyright files (#9015) 2020-12-10 09:29:38 -05:00
35bffd70e2 Fix documention of book in LayoutLM (#9017) 2020-12-10 09:28:49 -05:00
c95de29e31 ✏️ Fix typo (#9020) 2020-12-10 08:22:52 +01:00
5e637e6c69 [wip] [ci] doc-job-skip take #4 dry-run (#8980)
* ci-doc-job-skip-take-4

* wip

* wip

* wip

* wip

* skip yaml

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* ready to test

* yet another way

* trying with HEAD

* trying with head.sha

* trying with head.sha fix

* trying with head.sha fix wip

* undo

* try to switch to sha

* current branch

* current branch

* PR number check

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride

* joy ride
2020-12-09 15:36:36 -05:00
06971ac4f9 [Bart] Refactor - fix issues, consistency with the library, naming (#8900)
* remove make on the fly linear embedding

* start refactor

* big first refactor

* save intermediate

* save intermediat

* correct mask issue

* save tests

* refactor padding masks

* make all tests pass

* further refactor

* make pegasus test pass

* fix bool if

* fix leftover tests

* continue

* bart renaming

* delete torchscript test hack

* fix imports in tests

* correct shift

* fix docs and repo cons

* re-add fix for FSTM

* typo in test

* fix typo

* fix another typo

* continue

* hot fix 2 for tf

* small fixes

* refactor types linting

* continue

* finish refactor

* fix import in tests

* better bart names

* further refactor and add test

* delete hack

* apply sylvains and lysandres commens

* small perf improv

* further perf improv

* improv perf

* fix typo

* make style

* small perf improv
2020-12-09 20:55:24 +01:00
75627148ee Flax Masked Language Modeling training example (#8728)
* Remove "Model" suffix from Flax models to look more 🤗

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Initial working (forward + backward) for Flax MLM training example.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Simply code

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing comments, using module and moving to LM task.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore parameter name "module" wrongly renamed model.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Restore correct output ordering...

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Actually commit the example 😅

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add FlaxBertModelForMaskedLM after rebasing.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make it possible to initialize the training from scratch

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reuse flax linen example of cross entropy loss

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added specific data collator for flax

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove todo for data collator

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added evaluation step

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added ability to provide dtype to support bfloat16 on TPU

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable flax tensorboard output

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable jax.pmap support.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure batches are correctly sized to be dispatched with jax.pmap

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable bfloat16 with --fp16 cmdline args

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correctly export metrics to tensorboard

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added dropout and ability to use it.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Effectively enable & disable during training and evaluation steps.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Oops.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Enable specifying kernel initializer scale

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added warmup step to the learning rate scheduler.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix typo.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Print training loss

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix linter issue (flake8)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix model matching

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix dummies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix non default dtype on Flax models

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same create_position_ids_from_input_ids for FlaxRoberta

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta attention as Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix copy

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Wording.

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
2020-12-09 17:13:56 +01:00
df2af6d8b8 Add MP Net 2 (#9004) 2020-12-09 10:32:43 -05:00
8729109855 fixes #8968 (#9009) 2020-12-09 16:21:41 +01:00
e977ed2142 Add the code_search_net dataset tag to CodeBERTa model cards (#9005) 2020-12-09 15:43:19 +01:00
da37a21c89 push (#9008) 2020-12-09 15:14:33 +01:00
61abd50b98 Remove use of deprected method in Trainer HP search (#8996) 2020-12-09 09:13:41 -05:00
7e1d709e2a Fix link to stable version in the doc navbar (#9007) 2020-12-09 09:11:39 -05:00
02d0e0355c Diverse beam search 2 (#9006)
* diverse beam search

* bug fixes

* bug fixes

* bug fix

* separate out diverse_beam_search function

* separate out diverse_beam_search function

* bug fix

* improve code quality

* bug fix

* bug fix

* separate out diverse beam search scorer

* code format

* code format

* code format

* code format

* add test

* code format

* documentation changes

* code quality

* add slow integration tests

* more general name

* refactor into logits processor

* add test

* avoid too much copy paste

* refactor

* add to docs

* fix-copies

* bug fix

* Revert "bug fix"

This reverts commit c99eb5a8dc57a7b0d33a8ac06d8c6a32a7812ad4.

* improve comment

* implement sylvains feedback

Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
Co-authored-by: ayushtiku5 <40797286+ayushtiku5@users.noreply.github.com>
2020-12-09 15:00:37 +01:00
67ff1c314a Templates overhaul 1 (#8993) 2020-12-08 18:00:07 -05:00
447808c85f New squad example (#8992)
* Add new SQUAD example

* Same with a task-specific Trainer

* Address review comment.

* Small fixes

* Initial work for XLNet

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Final clean up and working XLNet script

* Test and debug

* Final working version

* Add new SQUAD example

* Same with a task-specific Trainer

* Address review comment.

* Small fixes

* Initial work for XLNet

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Final clean up and working XLNet script

* Test and debug

* Final working version

* Add tick

* Update README

* Address review comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-12-08 14:39:29 -05:00
7809eb82ae Removed unused encoder_hidden_states and encoder_attention_mask (#8972)
* Removed unused `encoder_hidden_states` and `encoder_attention_mask` from MobileBert

* Removed decoder tests for MobileBert

* Removed now unnecessary import
2020-12-08 12:04:34 -05:00
b7cdd00f15 Fix interaction of return_token_type_ids and add_special_tokens (#8854) 2020-12-08 12:04:01 -05:00
04c446f764 Make ModelOutput pickle-able (#8989) 2020-12-08 11:59:40 -05:00
0d9e6ca9ed [model_card] remove bogus testing changes 2020-12-08 09:58:45 -05:00
bf7f79cd57 Optional layers (#8961)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add deprecated arguments

* Add input processing to TF XLM

* remove unused imports

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Fix wrong model name

* Fix BART

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Address Patrick's comments

* Address Patrick's comments

* Add boolean processing for the inputs

* Take into account the optional layers

* Add missing/unexpected weights in the other models

* Apply style

* rename parameters

* Apply style

* Remove useless

* Remove useless

* Remove useless

* Update num parameters

* Fix tests

* Address Patrick's comment

* Remove useless attribute
2020-12-08 09:14:09 -05:00
9d7d0005b0 [training] SAVE_STATE_WARNING was removed in pytorch (#8979)
* [training] SAVE_STATE_WARNING was removed in pytorch

FYI `SAVE_STATE_WARNING` has been removed 3 days ago: pytorch/pytorch#46813

Fixes: #8232

@sgugger

* style, but add () to prevent autoformatters from botching it

* switch to try/except

* cleanup
2020-12-07 21:59:55 -08:00
2ae7388eee Check table as independent script (#8976) 2020-12-07 19:55:12 -05:00
00aa9dbca2 Copyright (#8970)
* Add copyright everywhere missing

* Style
2020-12-07 18:36:34 -05:00
c108d0b5a4 add max_length to showcase the use of truncation (#8975) 2020-12-07 18:35:39 -05:00
62d30e0583 Small fix to the run clm script (#8973) 2020-12-07 17:32:09 -05:00
28fa014a1f transformers-cli: LFS multipart uploads (> 5GB) (#8663)
* initial commit

* [cli] lfs commands

* Fix FileSlice

* Tweak to FileSlice

* [hf_api] Backport filetype arg from `datasets`

cc @lhoestq

* Silm down the CI while i'm working

* Ok let's try this in CI

* Update config.yml

* Do not try this at home

* one more try

* Update lfs.py

* Revert "Tweak to FileSlice"

This reverts commit d7e32c4b3500400486411e85a2b74e57fb6b52f5.

* Update test_hf_api.py

* Update test_hf_api.py

* Update test_hf_api.py

* CI still green?

* make CI green again?

* Update test_hf_api.py

* make CI red again?

* Update test_hf_api.py

* add CI style back

* Fix CI?

* oh my

* doc + switch back to real staging endpoint

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>

* Fix docblock + f-strings

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
2020-12-07 16:38:39 -05:00
0bce7c5508 Create README.md (#8964) 2020-12-07 16:04:14 -05:00
7ccd973ea1 Update README.txt (#8957) 2020-12-07 16:01:49 -05:00
37f4c24f10 > 30 files leads to hanging on --More--
cancel debug printing for now. As it can be seen lead to a failing test here:
https://app.circleci.com/pipelines/github/huggingface/transformers/16894/workflows/cc86f7a9-4020-45af-8ab3-c22f79b427cf/jobs/131924
2020-12-07 12:18:05 -08:00
7f9ccffc5b Use word_ids to get labels in run_ner (#8962)
* Use word_ids to get labels in run_ner

* Add sanity check
2020-12-07 14:26:36 -05:00
de6befd41f Remove sourcerer (#8965) 2020-12-07 11:15:29 -05:00
483e13273f Add TFGPT2ForSequenceClassification based on DialogRPT (#8714)
* Add TFGPT2ForSequenceClassification based on DialogRPT

* Add TFGPT2ForSequenceClassification based on DialogRPT

* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing

* Add TFGPT2ForSequenceClassification based on DialogRPT

* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing

* code refactor for latest other TF PR

* code refactor

* code refactor

* Update modeling_tf_gpt2.py
2020-12-07 16:58:37 +01:00
28c77ddf3b Fix QA pipeline on Windows (#8947) 2020-12-07 09:50:32 -05:00
72d6c9c68b Add model card (#8948)
* add model card

* lowercase identifier

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-12-06 11:16:32 -05:00
ef93a25427 Fix typo for modeling_bert import resulting in ImportError (#8931)
Self-explanatory ;) - Hope it helps!
2020-12-05 09:57:37 -05:00
8dfc8c7221 Don't pass in token_type_ids to BART for GLUE (#8929)
Without this fix, training a `BARTForSequenceClassification` model with `run_pl_glue.py` gives `TypeError: forward() got an unexpected keyword argument 'token_type_ids'`, because BART does not have token_type_ids. I've solved this issue in the same way as it's solved for the "distilbert" model, and I can train BART models on SNLI without errors now.
2020-12-05 09:52:16 -05:00
df311a5ccf [seq2seq] document the caveat of leaky native amp (#8930)
* document the caveat of leaky native amp

* Update examples/seq2seq/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-04 15:43:35 -08:00
73c51f7fcd [ci] skip doc jobs - circleCI is not reliable - disable skip for now (#8926)
* disable skipping, but leave logging for the future
2020-12-04 10:13:42 -08:00
71688a8889 Fix TF T5 only encoder model with booleans (#8925) 2020-12-04 12:28:47 -05:00
dcd3046f98 Better booleans handling in the TF models (#8777)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add deprecated arguments

* Add input processing to TF XLM

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Bug fix

* Retry to bugfix

* Retry bug fix

* Fix wrong model name

* Try another fix

* Fix BART

* Fix input precessing

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Bug fix

* Address Patrick's comments

* Address Patrick's comments

* Address Sylvain's comments

* Add boolean processing for the inputs

* Apply style

* Missing optional

* Fix missing some input proc

* Update the template

* Fix missing inputs

* Missing input

* Fix args parameter

* Trigger CI

* Trigger CI

* Trigger CI

* Address Patrick's and Sylvain's comments

* Replace warn by warning

* Trigger CI

* Fix XLNET

* Fix detection
2020-12-04 09:08:29 -05:00
4c3d98dddc [s2s finetune_trainer] add instructions for distributed training (#8884) 2020-12-03 16:05:55 -08:00
aa60b230ec Patch model parallel test (#8920)
* Patch model parallel test

* Remove line

* Remove `ci_*` from scheduled branches
2020-12-03 17:15:47 -05:00
0c5615af66 Put Transformers on Conda (#8918)
* conda

* Guide

* correct tag

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/installation.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain's comments

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-03 14:28:49 -05:00
9ad6194318 Tweak wording + Add badge w/ number of models on the hub (#8914)
* Add badge w/ number of models on the hub

* try to apease @sgugger 😇

* not sure what this `c` was about [ci skip]

* Fix script and move stuff around

* Fix doc styling error

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-12-03 10:56:55 -05:00
6ed7e32f7c Fix move when the two cache folders exist (#8917) 2020-12-03 10:50:13 -05:00
8453201cfe Avoid erasing the attention mask when double padding (#8915) 2020-12-03 10:45:07 -05:00
0deece9c53 Don't warn that models aren't available if Flax is available. (#8841) 2020-12-03 10:33:12 -05:00
2b7fc9a0fd [model_cards] lm-head was deprecated
(and wasn't needed here anyways as it was added automatically)
2020-12-03 15:05:01 +01:00
443f67e887 [PyTorch] Refactor Resize Token Embeddings (#8880)
* fix resize tokens

* correct mobile_bert

* move embedding fix into modeling_utils.py

* refactor

* fix lm head resize

* refactor

* break lines to make sylvain happy

* add news tests

* fix typo

* improve test

* skip bart-like for now

* check if base_model = get(...) is necessary

* clean files

* improve test

* fix tests

* revert style templates

* Update templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py
2020-12-02 19:19:50 +01:00
e52f9c0ade Update README.md (#8906) 2020-12-02 09:28:44 -08:00
801b2cb36f Fix typo in docstring (#8905) 2020-12-02 12:08:31 -05:00
7e1cb00c37 [trainer] improve code readability (#8903)
* [trainer] improve code

This PR:
- removes redundant code 
```
self.model = model if model is not None else None
```
and
```
self.model = model
```
are the same.

* separate attribute assignment from code logic - which simplifies things further.

* whitespace
2020-12-02 09:07:42 -08:00
a8c3f9aa76 Warning about too long input for fast tokenizers too (#8799)
* Warning about too long input for fast tokenizers too

If truncation is not set in tokenizers, but the tokenization is too long
for the model (`model_max_length`), we used to trigger a warning that

The input would probably fail (which it most likely will).

This PR re-enables the warning for fast tokenizers too and uses common
code for the trigger to make sure it's consistent across.

* Checking for pair of inputs too.

* Making the function private and adding it's doc.

* Remove formatting ?? in odd place.

* Missed uppercase.
2020-12-02 10:18:28 -05:00
f6b44e6190 Transfoxl seq classification (#8868)
* Transfoxl sequence classification

* Transfoxl sequence classification
2020-12-02 10:08:32 -05:00
24f0c2fe33 [ci] skip doc jobs take #3 (#8885)
* check that we get any match first

* docs only

* 2 docs only

* add code

* restore
2020-12-02 10:06:45 -05:00
693ac3594b disable job skip - need more work
reference: https://github.com/huggingface/transformers/pull/8853#issuecomment-736779863
2020-12-01 12:03:29 -08:00
379005c9d2 start using training_args.parallel_mode (#8882) 2020-12-01 11:40:36 -08:00
b08843cf4d Add a parallel_mode property to TrainingArguments (#8877)
* Add a `distributed_env` property to TrainingArguments

* Change name

* Address comment
2020-12-01 13:46:09 -05:00
7c10dd22ae Better support for resuming training (#8878) 2020-12-01 13:45:21 -05:00
21db560df3 [CI] skip docs-only jobs take #2 (#8853)
* restore skip

* Revert "Remove deprecated `evalutate_during_training` (#8852)"

This reverts commit 553029909620455e040a49032a9c45f6a5f0cd52.

* check that pipeline.git.base_revision is defined before proceeding

* Revert "Revert "Remove deprecated `evalutate_during_training` (#8852)""

This reverts commit dfec84db3fdce1079f01f1bc8dfaf21db2ccaba1.

* check that pipeline.git.base_revision is defined before proceeding

* doc only

* doc + code

* restore

* restore

* typo
2020-12-01 13:15:25 -05:00
a947386cee Better warning when loading a tokenizer with AutoTokenizer w/o SnetencePiece (#8881) 2020-12-01 13:13:11 -05:00
9c18f15685 Prevent BatchEncoding from blindly passing casts down to the tensors it contains. Fixes #6582. (#8860)
Update src/transformers/tokenization_utils_base.py with review fix

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-12-01 13:01:52 -05:00
c0df963ee1 Make the big table creation/check platform independent (#8856) 2020-12-01 11:45:57 -05:00
d366228df1 2 typos in modeling_rag.py (#8676)
* 2 typos - from_question_encoder_generator_configs

fix 2 typos
from_encoder_generator_configs --> from_question_encoder_generator_configs

* apply make style
2020-12-01 16:16:48 +01:00
814b9550d7 Fix doc for language code (#8848) 2020-12-01 10:44:37 +01:00
4a9e502a36 Ctrl for sequence classification (#8812)
* add CTRLForSequenceClassification

* pass local test

* merge with master

* fix modeling test for sequence classification

* fix deco

* fix assert
2020-12-01 09:49:27 +01:00
7f34d75780 [s2s trainer] fix DP mode (#8823)
* fix DP case on multi-gpu

* make executable

* test all 3 modes

* use the correct check for distributed

* dp doesn't need a special case

* restore original name

* cleanup
2020-11-30 12:55:56 -08:00
d8fc26e919 NerPipeline (TokenClassification) now outputs offsets of words (#8781)
* NerPipeline (TokenClassification) now outputs offsets of words

- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:

```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
                                    # or entity["entity"] if not grouped
```

* Fixing doc style
2020-11-30 14:05:08 -05:00
5fd3d81ec9 fix pypi complaint on version naming 2020-11-30 13:54:52 -05:00
51b071313b Attempt to fix Flax CI error(s) (#8829)
* Slightly increase tolerance between pytorch and flax output

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* test_multiple_sentences doesn't require torch

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify parameterization on "jit" to use boolean rather than str

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove pytest.mark.parametrize which seems to fail in some circumstances

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Give default parameters values for traced model.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Review comment: Change sentences to sequences

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-30 13:43:17 -05:00
9995a341c9 Update docs 2020-11-30 12:07:52 -05:00
22b0ff757a Release: v4.0.0 2020-11-30 12:07:43 -05:00
5530299096 Remove deprecated evalutate_during_training (#8852)
* Remove deprecated `evalutate_during_training`

* Update src/transformers/training_args_tf.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-30 11:12:15 -05:00
773849415a Use model.from_pretrained for DataParallel also (#8795)
* Use model.from_pretrained for DataParallel also

When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.

This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.

If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?

* Fix silly bug

* Address review comments

Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
2020-11-30 11:11:10 -05:00
4062c75e44 Merge remote-tracking branch 'origin/master' 2020-11-30 10:51:35 -05:00
08e707633c Comment the skip job on doc line 2020-11-30 10:51:25 -05:00
75f8100fc7 Add a direct link to the big table (#8850) 2020-11-30 10:29:23 -05:00
cc983cd9cd Correct docstring. (#8845)
Related issue: https://github.com/huggingface/transformers/issues/8837
2020-11-30 09:33:30 -05:00
19fa01ce2a token-classification: use is_world_process_zero instead of deprecated is_world_master() (#8828) 2020-11-30 09:21:56 -05:00
40ecaf0c2b Add T5 Encoder for Feature Extraction (#8717)
* Add T5 Encoder class for feature extraction

* fix T5 encoder add_start_docstrings indent

* update init with T5 encoder

* update init with TFT5ModelEncoder

* remove TFT5ModelEncoder

* change T5ModelEncoder order in init

* add T5ModelEncoder to transformers init

* clean T5ModelEncoder

* update init with TFT5ModelEncoder

* add TFModelEncoder for Tensorflow

* update init with TFT5ModelEncoder

* Update src/transformers/models/t5/modeling_t5.py

change output from Seq2SeqModelOutput to BaseModelOutput

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* remove encoder_outputs

1. remove encoder_outputs from the function call.
2. remove the encoder_outputs If statement.
3. remove isinstance from return_dict.

* Authorize missing decoder keys

* remove unnecessary input parameters

remove pask_key_values and use_cache

* remove use_cache

remove use_cache from the forward method

* add doctoring for T5 encoder

add doctoring for T5 encoder with T5_ENCODER_INPUTS_DOCSTRING

* change return_dict to dot access

* add T5_ENCODER_INPUTS_DOCSTRING for TF T5

* change TFT5Encoder output type to BaseModelOutput

* remove unnecessary parameters for TFT5Encoder

* remove unnecessary if statement

* add import BaseModelOutput

* fix BaseModelOutput typo to TFBaseModelOutput

* update T5 doc with T5ModelEncoder

* add T5ModelEncoder to tests

* finish pytorch

* finish docs and mt5

* add mtf to init

* fix init

* remove n_positions

* finish PR

* Update src/transformers/models/mt5/modeling_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/t5/modeling_tf_t5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/mt5/modeling_tf_mt5.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-30 08:34:40 +01:00
610cb106a2 Migration guide from v3.x to v4.x (#8763)
* Migration guide from v3.x to v4.x

* Better wording

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain's comments

* Better wording.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-29 20:13:07 -05:00
c239dcda83 [CI] implement job skipping for doc-only PRs (#8826)
* implement job skipping for doc-only PRs

* silent grep is crucial

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* let's add doc

* let's add code

* revert test commits

* restore

* Better name

* Better name

* Better name

* some more testing

* some more testing

* some more testing

* finish testing
2020-11-29 11:31:30 -05:00
3a08cc1ce7 Minor docs typo fixes (#8797)
* Fix minor typos

* Additional typos

* Style fix

Co-authored-by: guyrosin <guyrosin@assist-561.cs.technion.ac.il>
2020-11-29 11:27:00 -05:00
5ced23dc84 [Pegasus] Refactor Tokenizer (#8731)
* refactor

* further refactor

* fix the rest tomorrow

* save intermediate

* finish slow tokenizer

* make more tests pass

* finish refactor

* fix comment

* clean further

* fix name

* fix naming

* Update src/transformers/models/reformer/tokenization_reformer.py

* Apply suggestions from code review

* Apply suggestions from code review

* refactor

* fix init tokenizers

* refactor

* improve convert

* refactor

* correct convert slow tokenizer

* final fix for Pegasus Tok

* remove ipdb

* improve links
2020-11-29 16:57:43 +01:00
36b60ce9e8 fix mt5 config (#8832) 2020-11-28 19:50:49 +01:00
18c32eeb21 Model parallel tests should return, not pass in non model parallel settings. (#8825) 2020-11-27 16:41:29 -05:00
edbff1fd00 Temporarily deactivate model generation 2020-11-27 16:15:00 -05:00
00ea45659f suggest a numerical limit of 50MB for determining @slow (#8824) 2020-11-27 16:04:54 -05:00
0a921b6459 BART & FSMT: fix decoder not returning hidden states from the last layer (#8597)
* Fix decoder not returning hidden states from the last layer

* Resolve conflict

* Change the way to gather hidden states

* Add decoder hidden states test

* Make pytest and black happy

* Remove redundant line

* remove new line

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2020-11-27 18:35:34 +01:00
81fe0bf085 Add barthez model (#8393)
* Add init barthez

* Add barthez model, tokenizer and docs

BARThez is a pre-trained french seq2seq model that uses BART objective.

* Apply suggestions from code review docs typos

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add license

* Change URLs scheme

* Remove barthez model keep tokenizer

* Fix style

* Fix quality

* Update tokenizer

* Add fast tokenizer

* Add fast tokenizer test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-27 12:31:42 -05:00
b0f2dbc594 Fix setup.py (#8798)
enforce unix newline encoding regardless of OS creating the file
2020-11-27 09:25:20 -08:00
03bddc375b Create README.md (#8729)
* Create README.md

* Fix model path
2020-11-27 18:19:15 +01:00
f9a2a9e32b Extend typing to path-like objects in PretrainedConfig and PreTrainedModel (#8770)
* update configuration_utils.py typing to allow pathlike objects when sensible

* update modeling_utils.py typing to allow pathlike objects when sensible

* black

* update tokenization_utils_base.py typing to allow pathlike objects when sensible

* update tokenization_utils_fast.py typing to allow pathlike objects when sensible

* update configuration_auto.py typing to allow pathlike objects when sensible

* update configuration_auto.py docstring to allow pathlike objects when sensible

* update tokenization_auto.py docstring to allow pathlike objects when sensible

* black
2020-11-27 10:52:58 -05:00
a7d46a0609 Fix dpr<>bart config for RAG (#8808)
* correct dpr test and bert pos fault

* fix dpr bert config problem

* fix layoutlm

* add config to dpr as well
2020-11-27 16:26:45 +01:00
a2cf37595e [Flax test] Add require pytorch to flix flax test (#8816)
* try flax fix

* same for roberta
2020-11-27 14:40:42 +01:00
e3ef62bce1 Update README.md (#8815)
The tokenizer called at the input_ids of example 2 is currently encoding text_1. I think this should be changed to text_2.
2020-11-27 08:34:57 -05:00
f8eda599bd [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes (#8791)
* [FlaxBert] Fix non-broadcastable attention mask for batched forward-passes

* [FlaxRoberta] Fix non-broadcastable attention mask

* Use jax.numpy instead of ordinary numpy (otherwise not jit-able)

* Partially revert "Use jax.numpy ..."

* Add tests for batched forward passes

* Avoid unnecessary OOMs due to preallocation of GPU memory by XLA

* Auto-fix style

* Re-enable GPU memory preallocation but with mem fraction < 1/paralleism
2020-11-27 13:21:19 +01:00
cb7602b38d typo (#8810) 2020-11-26 14:47:36 -08:00
ddf3c64654 potpurri of small fixes (#8807) 2020-11-26 14:06:27 -08:00
52708d2637 Fix PPLM (#8779)
* Fix pplm

* fix style

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-26 22:23:36 +01:00
8f07f5c44b Revert "finetune.py: specifying generation min_length (#8478)" (#8805)
This reverts commit 5aa361f3e56de0f65720f291bb3975bfc98f2837.
2020-11-26 20:12:01 +01:00
66e9608bae Create README.md (#8760) 2020-11-26 12:43:43 -05:00
5aa361f3e5 finetune.py: specifying generation min_length (#8478) 2020-11-26 12:33:02 +05:30
30e7f7e5da Create README.md (#8752) 2020-11-25 17:38:21 -05:00
2a6fbe6a40 [XLNet] Fix mems behavior (#8567)
* fix mems in xlnet

* fix use_mems

* fix use_mem_len

* fix use mems

* clean docs

* fix tf typo

* make xlnet tf for generation work

* fix tf test

* refactor use cache

* add use cache for missing models

* correct use_cache in generate

* correct use cache in tf generate

* fix tf

* correct getattr typo

* make sylvain happy

* change in docs as well

* do not apply to cookie cutter statements

* fix tf test

* make pytorch model fully backward compatible
2020-11-25 16:54:59 -05:00
369f1d77b4 Return correct Bart hidden state tensors (#8747)
* bart output hidden states upstream

* same w/ decoder

* add tests

* fix prophetnet

* fix gpt2 and ctrl

* fix fstm and skip test for reformer and longformer

* fix all models

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-25 22:06:04 +01:00
138f45c184 Fix QA argument handler (#8765)
* Fix QA argument handler

* Attempt to get a better fix for QA (#8768)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2020-11-25 14:02:15 -05:00
4821ea5aeb Big model table (#8774)
* First draft

* Styling

* With all changes staged

* Update docs/source/index.rst

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Styling

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-25 12:02:15 -05:00
90d5ab3bfe Create README.md (#8761) 2020-11-24 17:51:24 -05:00
29d4992453 New TF model inputs (#8602)
* Apply on BERT and ALBERT

* Update TF Bart

* Add input processing to TF BART

* Add input processing for TF CTRL

* Add input processing to TF Distilbert

* Add input processing to TF DPR

* Add input processing to TF Electra

* Add input processing for TF Flaubert

* Add deprecated arguments

* Add input processing to TF XLM

* remove unused imports

* Add input processing to TF Funnel

* Add input processing to TF GPT2

* Add input processing to TF Longformer

* Add input processing to TF Lxmert

* Apply style

* Add input processing to TF Mobilebert

* Add input processing to TF GPT

* Add input processing to TF Roberta

* Add input processing to TF T5

* Add input processing to TF TransfoXL

* Apply style

* Rebase on master

* Bug fix

* Retry to bugfix

* Retry bug fix

* Fix wrong model name

* Try another fix

* Fix BART

* Fix input precessing

* Apply style

* Put the deprecated warnings in the input processing function

* Remove the unused imports

* Raise an error when len(kwargs)>0

* test ModelOutput instead of TFBaseModelOutput

* Bug fix

* Address Patrick's comments

* Address Patrick's comments

* Address Sylvain's comments

* Add the new inputs in new Longformer models

* Update the template with the new input processing

* Remove useless assert

* Apply style

* Trigger CI
2020-11-24 13:55:00 -05:00
82d443a7fd [core] implement support for run-time dependency version checking (#8645)
* implement support for run-time dependency version checking

* try not escaping !

* use findall that works on py36

* small tweaks

* autoformatter worship

* simplify

* shorter names

* add support for non-versioned checks

* add deps

* revert

* tokenizers not required, check version only if installed

* make a proper distutils cmd and add make target

* tqdm must be checked before tokenizers

* workaround the DistributionNotFound peculiar setup

* handle the rest of packages in setup.py

* fully sync setup.py's install_requires - to check them all

* nit

* make install_requires more readable

* typo

* Update setup.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* restyle

* add types

* simplify

* simplify2

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-24 13:22:25 -05:00
a7d73cfdd4 fix rag index names in eval_rag.py example (#8730) 2020-11-24 17:04:47 +01:00
8d4ed7e953 added instructions for syncing upstream master with forked master via PR (#8745)
* added instructions for syncing upstream master with forked master via PR

* expand to add a note to why this is requested

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2020-11-24 10:11:46 -05:00
e09e54fd9d MT5 should have an autotokenizer (#8743)
* MT5 should have an autotokenizer

* Different configurations should be able to point to same tokenizers
2020-11-24 09:50:25 -05:00
6fdd0bb231 Fix slow tests v2 (#8746)
* Fix BART test

* Fix MBART tests

* Remove erroneous line from yaml

* Update tests/test_modeling_bart.py

* Quality
2020-11-24 09:35:12 -05:00
2c83b3c38d Support various BERT relative position embeddings (2nd) (#8276)
* Support BERT relative position embeddings

* Fix typo in README.md

* Address review comment

* Fix failing tests

* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py

* make fix copies

* fix configs of electra and albert and fix longformer

* remove copy statement from longformer

* fix albert

* fix electra

* Add bert variants forward tests for various position embeddings

* [tiny] Fix style for test_modeling_bert.py

* improve docstring

* [tiny] improve docstring and remove unnecessary dependency

* [tiny] Remove unused import

* re-add to ALBERT

* make embeddings work for ALBERT

* add test for albert

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-24 14:40:53 +01:00
9e71aa2f8f [EsperBERTo] Fix URLs to assets 2020-11-24 14:15:30 +01:00
02f48b9bfc Model parallel documentation (#8741)
* Add parallelize methods to the .rst files

* Correct format
2020-11-23 20:14:48 -05:00
7f2c00913a TF BERT test update 2020-11-23 18:20:19 -05:00
e1b7e10d5f Update TF BERT test 2020-11-23 18:19:12 -05:00
8ffc01a76a Add early stopping callback to pytorch trainer (#8581)
* Add early stopping patience and minimum threshold metric must improve to prevent early stopping to pytorch trainer

* Add early stopping test

* Set patience counter to 0 if best metric not defined yet

* Make early stopping a callback. Add callback event for updating the best metric for early stopping callback to trigger on.

* Run make style

* make funciton name sensible

* Improve new argument docstring wording and hope that flakey CI test passes.

* Use on_evaluation callback instead of custom. Remove some debug printing

* Move early stopping arguments and state into early stopping callback

* Run make style

* Remove old code

* Fix docs formatting. make style went rogue on me.

* Remove copied attributes and fix variable

* Add assertions on training arguments instead of mutating them. Move comment out of public docs.

* Make separate test for early stopping callback. Add test of invalid arguments.

* Run make style... I remembered before CI this time!

* appease flake8

* Add EarlyStoppingCallback to callback docs

* Make docstring EarlyStoppingCallabck match other callbacks.

* Fix typo in docs
2020-11-23 17:25:35 -05:00
367f497dec Fix max length in run_plm script (#8738) 2020-11-23 16:02:31 -05:00
e84786aaa6 consistent ignore keys + make private (#8737)
* consistent ignore keys + make private

* style

* - authorized_missing_keys    => _keys_to_ignore_on_load_missing
  - authorized_unexpected_keys => _keys_to_ignore_on_load_unexpected

* move public doc of private attributes to private comment
2020-11-23 12:33:13 -08:00
49759c0cda Document new training argument 2020-11-23 15:02:59 -05:00
1cd9be2aeb gpt2 and t5 parallel modeling (#8696)
* gpt2 and t5 parallel modeling

* model_parallel utils update

* adding missing model_parallel_utils

Adds missing model_parallel_utils and reverses the changes to code in modeling_gpt2 and modeling_t5

* training_args reformat

Reformatted training_args

* style formatting

Style formatting doc string length on training_args and model_parallel_utils

* style changes

make style && make quality for training_args and model_parallel_utils.

* adding tests

* minor change in trainer

reverts loss calculation

* Update training_args.py

* Update training_args.py

added back docstring language for adam_beta1 and adam_beta2

* Update trainer.py

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix style & rebase

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-23 14:41:23 -05:00
1e45bef0a7 [trainer] make generate work with multigpu (#8716)
* make generate work with multigpu

* better fix - thanks @sgugger
2020-11-23 10:57:27 -08:00
900024273b Change default cache path (#8734)
* Change default cache path

* Document changes

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-23 13:56:45 -05:00
0cc5ab1333 Improve bert-japanese tokenizer handling (#8659)
* Make ci fail

* Try to make tests actually run?

* CI finally failing?

* Fix CI

* Revert "Fix CI"

This reverts commit ca7923be7334d4e571b023478ebdd6b33dfd0ebb.

* Ooops wrong one

* one more try

* Ok ok let's move this elsewhere

* Alternative to globals() (#8667)

* Alternative to globals()

* Error is raised later so return None

* Sentencepiece not installed make some tokenizers None

* Apply Lysandre wisdom

* Slightly clearer comment?

cc @sgugger

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-23 11:15:02 -05:00
eec76615f6 [model_cards]: control input examples of Geotrend models (#8727)
* [model_cards]: control arabic model examples

* [model_cards]: control input examples of Geotrend models

* [model_cards]: add link to generatation script
2020-11-23 11:09:50 -05:00
143b564e59 Add pip install update to resolve import error in transformers notebook (#8616)
* Add pip install update to resolve import error

Add pip install upgrade tensorflow-gpu to remove error below:
```
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-094fadb93f3f> in <module>()
      1 import torch
----> 2 from transformers import AutoModel, AutoTokenizer, BertTokenizer
      3 
      4 torch.set_grad_enabled(False)

4 frames
/usr/local/lib/python3.6/dist-packages/transformers/__init__.py in <module>()
    133 
    134 # Pipelines
--> 135 from .pipelines import (
    136     Conversation,
    137     ConversationalPipeline,

/usr/local/lib/python3.6/dist-packages/transformers/pipelines.py in <module>()
     46     import tensorflow as tf
     47 
---> 48     from .modeling_tf_auto import (
     49         TF_MODEL_FOR_QUESTION_ANSWERING_MAPPING,
     50         TF_MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING,

/usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_auto.py in <module>()
     49 from .configuration_utils import PretrainedConfig
     50 from .file_utils import add_start_docstrings
---> 51 from .modeling_tf_albert import (
     52     TFAlbertForMaskedLM,
     53     TFAlbertForMultipleChoice,

/usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_albert.py in <module>()
     22 import tensorflow as tf
     23 
---> 24 from .activations_tf import get_tf_activation
     25 from .configuration_albert import AlbertConfig
     26 from .file_utils import (

/usr/local/lib/python3.6/dist-packages/transformers/activations_tf.py in <module>()
     52     "gelu": tf.keras.layers.Activation(gelu),
     53     "relu": tf.keras.activations.relu,
---> 54     "swish": tf.keras.activations.swish,
     55     "silu": tf.keras.activations.swish,
     56     "gelu_new": tf.keras.layers.Activation(gelu_new),

AttributeError: module 'tensorflow_core.python.keras.api._v2.keras.activations' has no attribute 'swish'
```
I have tried running the colab after this change and it seems to work fine (all the cells run with no errors).

* Update notebooks/02-transformers.ipynb

only need to upgrade tensorflow, not tensorflow-gpu.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-23 09:58:52 -05:00
18c8cf000b Fix bug in x-attentions output for roberta and harden test to catch it (#8660) 2020-11-23 13:28:29 +01:00
48cc224703 [model_cards] Add card for gpt2-rnm (#8673) 2020-11-23 05:52:29 -05:00
52585e40af create README.md (#8682)
* create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-23 05:51:54 -05:00
b5187e317f added bangla-bert-sentiment model card (#8687) 2020-11-23 05:51:16 -05:00
b6d864e2f0 Create README.md (#8630)
* Create README.md

* correct metrics id

cc @lhoestq

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-23 04:48:10 -05:00
e1f3156b21 Fix many typos (#8708) 2020-11-21 22:58:10 -05:00
9c0afdaf7b fix flaky ci (#8694) 2020-11-20 22:07:21 +01:00
29bdb88368 Vectorize RepetitionPenaltyLogitsProcessor to improve performance (#8598)
* refactored exisiting nested loops to vectorized implementation

* replaced explicit indexing with torch.where

* modifying score for previous input_ids only
2020-11-20 19:59:06 +01:00
2594bd8b73 moved temperature wrapper before topP/topK (#8686) 2020-11-20 19:33:54 +01:00
8062fa63c5 Fix rag finetuning + add finetuning test (#8585)
* replace init_ddp_connection for index init

* style

* add finetune test

* add test data

* move generate tensors to device

* add test on EM metric

* style

* allow multi process test

* keep gloo process group for retrieval

* add multi-gpu test

* use custom accelerator

* clean test finetune

* minor

* style

* style

* typo

* use python call instead of imported main fumction

* return_dict fix in modeling_rag

* use float32 in retrieval

* store as float32 as well in the custom knowledge dataset example

* style

* rename to finetune_rag

* style

* update readme

* rename utils and callbacks to utils_rag and callbacks_rag

* fix test

* patrick's comments

* generate dummy data in the finetue test script

* remove dummy data files

* style
2020-11-20 19:05:03 +01:00
63e91f5fde Document adam betas TrainingArguments (#8688) 2020-11-20 09:27:25 -05:00
94caaa93c2 Update the bibtex with EMNLP demo (#8678)
* Update the bibtex with EMNLP demo

* Update README.md

* Update README.md
2020-11-20 13:26:33 +08:00
6494910f27 Add sentencepiece to the CI and fix tests (#8672)
* Fix the CI and tests

* Fix quality

* Remove that m form nowhere
2020-11-19 16:44:20 -05:00
0ad45e108d [examples/seq2seq] fix PL deprecation warning (#8577)
* fix deprecation warning

* fix
2020-11-19 21:46:04 +01:00
0e19a4c2d6 Update bert-base-multilingual-cased-README.md (#8668)
The heading was originally uncased, which did not reflect the contents of this README. Changed it to cased.
2020-11-19 15:45:06 -05:00
06518404cb revert 2020-11-19 12:12:46 -08:00
297a29382f Please fix your software not to ping master
You may be unaware but you're running some software that meddles with every commit on https://github.com/huggingface/transformers/

Something is wrong with the software you're using. It adds a reference to almost every PR in the master tree. Which is very wrong. Please check your software and please don't do it again.

Example:
see the bottom of this PR and most other PRs:
https://github.com/huggingface/transformers/pull/8639
2020-11-19 12:11:35 -08:00
42111f1d56 [tokenizers] convert_to_tensors: don't reconvert when the type is already right (#8283)
* don't reconvert when the type is already right

* better name

* adjust logic as suggested

* merge
2020-11-19 12:06:01 -08:00
20b658607e Fix run_ner script (#8664)
* Fix run_ner script

* Pin datasets
2020-11-19 13:59:30 -05:00
ca0109bd68 disable_ngram_loss fix for prophetnet (#8554)
* `disable_ngram_loss` fix for prophetnet

* add changes documentation

* fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments

* mean label smoothing loss

* small refactor

* fix test

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-19 19:18:07 +01:00
0603564e93 Merge remote-tracking branch 'origin/master' 2020-11-19 12:18:57 -05:00
1e08af383a Forgot to save... 2020-11-19 12:18:50 -05:00
d86b5ffc6f Release: v4.0.0-rc-1 2020-11-19 12:00:07 -05:00
cb3e5c33f7 Fix a few last paths for the new repo org (#8666) 2020-11-19 11:56:42 -05:00
a79a96ddaa fix small typo (#8644)
Fixed a small typo on the XLNet and permutation language modelling section
2020-11-19 11:24:11 -05:00
4208f496ee Better filtering of the model outputs in Trainer (#8633)
* Better filtering of the model outputs in Trainer

* Fix examples tests

* Add test for Lysandre
2020-11-19 10:43:15 -05:00
f2e07e7272 Fix a bunch of slow tests (#8634)
* CI should install `sentencepiece`

* Requiring TF

* Fixing some TFDPR bugs

* remove return_dict=False/True hack

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-19 10:41:41 -05:00
5362bb8a6b Tf longformer for sequence classification (#8231)
* working on LongformerForSequenceClassification

* add TFLongformerForMultipleChoice

* add TFLongformerForTokenClassification

* use add_start_docstrings_to_model_forward

* test TFLongformerForSequenceClassification

* test TFLongformerForMultipleChoice

* test TFLongformerForTokenClassification

* remove test from repo

* add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice

* add requested classes to modeling_tf_auto.py
update dummy_tf_objects
fix tests
fix bugs in requested classes

* pass all tests except test_inputs_embeds

* sync with master

* pass all tests except test_inputs_embeds

* pass all tests

* pass all tests

* work on test_inputs_embeds

* fix style and quality

* make multi choice work

* fix TFLongformerForTokenClassification signature

* fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature

* fix mult choice

* fix mc hint

* fix input embeds

* fix input embeds

* refactor input embeds

* fix copy issue

* apply sylvains changes and clean more

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-19 10:37:27 -05:00
62cd9ce9f8 fix missing return dict (#8653) 2020-11-19 15:17:18 +01:00
0c2677f529 [model card] : fix bert-base-15lang-cased (#8655)
the table was badly formatted because of a single line break
2020-11-19 05:41:02 -05:00
0a80959bdd Add cards for all Geotrend models (#8617)
* docs(bert-base-15lang-cased): add model card

* add cards for all Geotrend models

* [model cards] fix language tag for all Geotrend models
2020-11-19 04:47:24 -05:00
dcc9c64299 Updated the Extractive Question Answering code snippets (#8636)
* Updated the Extractive Question Answering code snippets

The Extractive Question Answering code snippets do not work anymore since the models return task-specific output objects. This commit fixes the pytorch and tensorflow examples but adding `.values()` to the model call.

* Update task_summary.rst
2020-11-18 18:56:47 -05:00
28d16e7ac5 Update README.md (#8635) 2020-11-18 18:35:23 -05:00
b290195ac7 grammar (#8639) 2020-11-18 18:04:25 -05:00
d86d57faa3 [s2s] distillation apex breaks return_dict obj (#8631)
* apex breaks return_dict obj

* style
2020-11-18 12:51:29 -08:00
bf3611b2ab Created ModelCard for Hel-ach-en MT model (#8496)
* Updated ModelCard

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-18 14:42:13 -05:00
c95b26a719 Create README.md (#8362) 2020-11-18 13:37:14 -05:00
fdbbb6c17a Model card: T5-base fine-tuned on QuaRTz (#8369)
* Model card: T5-base fine-tuned on QuaRTz

* Update model_cards/mrm8488/t5-base-finetuned-quartz/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-18 13:34:27 -05:00
6e6d24c5d8 Create README.md (#8363) 2020-11-18 13:33:04 -05:00
35fd3d64e3 Add model card for ai4bharat/indic-bert (#8464) 2020-11-18 13:28:49 -05:00
38f01dfe03 Update README.md (#8405)
* Update README.md

* Update README.md
2020-11-18 13:23:08 -05:00
2d8fbf012a Model Card for abhilash1910/financial_roberta (#8625)
* Model Card for abhilash1910/financial_roberta

* Update model_cards/abhilash1910/financial_roberta/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-18 13:22:28 -05:00
26dc6593f3 Update README.md (#8544)
Modified Model in Action section. The class `AutoModelWithLMHead` is deprecated so changed it to `AutoModelForSeq2SeqLM` for encoder-decoder models. Removed duplicate eos token.
2020-11-18 13:19:32 -05:00
6c8fad4f0d replace performance table with markdown (#8565)
* replace performance table with markdown

* Update model_cards/smanjil/German-MedBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-18 13:17:46 -05:00
e7f77fc52a model_cards for Chinese Couplet and Poem GPT2 models (#8620) 2020-11-18 13:06:30 -05:00
a0c62d2493 Fix training from scratch in new scripts (#8623) 2020-11-18 12:15:26 -05:00
1e62e999e8 Fixes the training resuming with gradient accumulation (#8624) 2020-11-18 12:00:11 -05:00
cdfa56afe0 [Tokenizer Doc] Improve tokenizer summary (#8622)
* improve summary

* small fixes

* cleaned line length

* correct "" formatting

* apply sylvains suggestions
2020-11-18 17:14:15 +01:00
2f9d49b389 Adding PrefixConstrainedLogitsProcessor (#8529)
* Adding PrefixConstrainedLogitsProcessor

* fixing RAG and style_doc

* fixing black (v20 instead of v19)

* Improving doc in generation_logits_process.py

* Improving docs and typing in generation_utils.py

* docs improvement

* adding test and fixing doc typo

* fixing doc_len

* isort on test

* fixed test

* improve docstring a bit

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-18 17:06:25 +01:00
3bc1540070 New TF loading weights (#8490)
* New TF loading weights

* apply style

* Better naming

* Largely comment the loading method

* Apply style

* Address Patrick's comments

* Remove useless line of code

* Update Docstring

* Address Sylvain's and Lysandre's comments

* Simplify the names computation

* Typos
2020-11-18 10:48:31 -05:00
0df91ee4f7 self.self.activation_dropout -> self.activation_dropout (#8611)
(one line typo)
2020-11-18 10:30:29 -05:00
cdf1b7ae82 fix to adjust for #8530 changes (#8612) 2020-11-18 10:25:00 -05:00
2819da02f7 [s2s] broken test (#8613) 2020-11-18 10:15:53 -05:00
9fa3ed1a7f Fix missing space in multiline warning (#8593)
Multiline string informing about missing PyTorch/TensorFlow had missing space.
2020-11-18 10:09:26 -05:00
8fcb6935a1 Fix DataCollatorForLanguageModeling (#8621) 2020-11-18 10:02:50 -05:00
f6fe41c96b Reset loss to zero on logging in Trainer to avoid bfloat16 issues (#8561)
* make tr_loss regular float

* Revert "make tr_loss regular float"

This reverts commit c9d7ccfaf0c4387187b0841694f01ec0ffd5f4ba.

* reset loss at each logging step

* keep track of total loss with _total_loss_scalar

* add remaining tr_loss at the end
2020-11-18 09:58:08 -05:00
b592728eff Fixed link to the wrong paper. (#8607) 2020-11-17 19:00:44 -05:00
0512444ee5 Remove old doc 2020-11-17 17:34:25 -05:00
5cf9c79665 Add Harry Potter Model Card (#8605)
* Add Harry Potter Model

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

* Update model_cards/ceostroff/harry-potter-gpt2-fanfiction/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-17 16:50:58 -05:00
dd52804f5f Remove deprecated (#8604)
* Remove old deprecated arguments

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Remove needless imports

* Fix tests

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-11-17 15:11:29 -05:00
3095ee9dab Tokenizers should be framework agnostic (#8599)
* Tokenizers should be framework agnostic

* Run the slow tests

* Not testing

* Fix documentation

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-17 14:03:03 -05:00
7f3b41a306 Fix check repo utils (#8600) 2020-11-17 14:01:46 -05:00
f0435f5a61 these should run fine on multi-gpu (#8582) 2020-11-17 14:00:41 -05:00
36a19915ea Fix model templates (#8595)
* First fixes

* Fix imports and add init

* Fix typo

* Move init to final dest

* Fix tokenization import

* More fixes

* Styling
2020-11-17 10:35:38 -05:00
042a6aa777 Tokenizers: ability to load from model subfolder (#8586)
* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co

Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-11-17 08:58:45 -05:00
48395d6b8e Fix init for MT5 (#8591) 2020-11-17 08:52:13 -05:00
a6cf9ca00b Add __init__ to the models folder 2020-11-17 07:39:37 -05:00
5104223552 [MT5] More docs (#8589)
* add docs

* make style
2020-11-17 12:47:57 +01:00
86822a358b T5 & mT5 (#8552)
* add mt5 and t5v1_1 model

* fix tests

* correct some imports

* add tf model

* finish tf t5

* improve examples

* fix copies

* clean doc
2020-11-17 12:23:09 +01:00
9e01f988dd model_card for indolem/indobert-base-uncased (#8579) 2020-11-17 03:36:50 -05:00
c89bdfbe72 Reorganize repo (#8580)
* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command
2020-11-16 21:43:42 -05:00
901507335f Fix mixed precision issue for GPT2 (#8572)
* Fix mixed precision issue for GPT2

* Forgot one cast

* oops

* Forgotten casts
2020-11-16 14:44:19 -05:00
1073a2bde5 Switch return_dict to True by default. (#8530)
* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests
2020-11-16 11:43:00 -05:00
0d0a0785fd Update version to v4.0.0-dev (#8568) 2020-11-16 10:21:19 -05:00
afb50c663a Fix GPT2DoubleHeadsModel to work with model.generate() (#6601)
* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used

and for GPT2LMHeadModel too

* Update tests to check token_type_ids usage in GPT2 models
2020-11-16 14:35:44 +01:00
04d8136bde Adding the prepare_seq2seq_batch function to ProphetNet (#8515)
* Simply insert T5Tokenizer's prepare_seq2seq_batch

* Update/Add some 'import'

* fix RunTimeError caused by '.view'

* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet

* Update test_tokenization_prophetnet.py

* Format the test code with black

* Re-format the test code

* Update test_tokenization_prophetnet.py

* Add importing require_torch in the test code

* Add importing BatchEncoding in the test code

* Re-format the test code on Colab
2020-11-16 14:18:25 +01:00
931b10978e [doc] typo fix (#8535)
* [doc] typo fix

@sgugger

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-16 08:05:30 -05:00
6db21a06ae Clearer Model Versioning Example (#8562) 2020-11-16 06:59:10 -05:00
daaa68451e Readme for Wiki Summary [Persian] bert2bert (#8558) 2020-11-16 05:04:46 -05:00
06d468d3f0 Readme for News Headline Generation (bert2bert) (#8557) 2020-11-16 05:04:38 -05:00
9b7fb8a368 Create README.md for Chinese RoBERTa Miniatures (#8550)
* Create README.md

* Update model_cards/uer/chinese_roberta_L-2_H-128/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-16 05:01:28 -05:00
f4e04cd2c6 [breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)
* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-15 22:50:59 +01:00
24184e73c4 Rework some TF tests (#8492)
* Update some tests

* Small update

* Apply style

* Use max_position_embeddings

* Create a fake attribute

* Create a fake attribute

* Update wrong name

* Wrong TransfoXL model file

* Keep the common tests agnostic
2020-11-13 17:07:17 -05:00
f6cdafdec7 fix load weights (#8528)
* fix load weights

* delete line
2020-11-13 20:31:40 +01:00
f6f4da8dd4 Add bart-large-mnli model card (#8527) 2020-11-13 14:07:25 -05:00
725269746b Model sharing doc: more tweaks (#8520)
* More doc tweaks

* Update model_sharing.rst

* make style

* missing newline

* Add email tip

Co-authored-by: Pierric Cistac <pierric@huggingface.co>
2020-11-13 12:10:26 -05:00
9d519dabb7 Fix paths in github YAML 2020-11-13 12:04:17 -05:00
826f04576f Model templates encoder only (#8509)
* Model templates

* TensorFlow

* Remove pooler

* CI

* Tokenizer + Refactoring

* Encoder-Decoder

* Let's go testing

* Encoder-Decoder in TF

* Let's go testing in TF

* Documentation

* README

* Fixes

* Better names

* Style

* Update docs

* Choose to skip either TF or PT

* Code quality fixes

* Add to testing suite

* Update file path

* Cookiecutter path

* Update `transformers` path

* Handle rebasing

* Remove seq2seq from model templates

* Remove s2s config

* Apply Sylvain and Patrick comments

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last fixes from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-13 11:59:30 -05:00
42e2d02e44 [T5] Bug correction & Refactor (#8518)
* fix bug

* T5 refactor

* refactor tf

* apply sylvains suggestions
2020-11-13 16:57:31 +01:00
42f63e3871 Merge remote-tracking branch 'origin/master' 2020-11-13 10:30:04 -05:00
bb03a14edd Update doc for v3.5.1 2020-11-13 10:29:58 -05:00
4df6b59318 Update deepset/roberta-base-squad2 model card (#8522)
* Update README.md

* Update README.md
2020-11-13 09:58:27 -05:00
0c9bae0934 Remove typo 2020-11-12 22:39:57 -05:00
5d80539488 Add pretraining loss computation for TF Bert pretraining (#8470)
* Add pretraining loss computation for TF Bert pretraining

* Fix labels creation

* Fix T5 model

* restore T5 kwargs

* try a generic fix for pretraining models

* Apply style

* Overide the prepare method for the BERT tests
2020-11-12 14:08:26 -05:00
91a67b7506 Use LF instead of os.linesep (#8491) 2020-11-12 13:52:40 -05:00
27b3ff316a Try to understand and apply Sylvain's comments (#8458) 2020-11-12 13:43:00 -05:00
0fa0349883 fix SqueezeBertForMaskedLM (#8479) 2020-11-12 12:19:37 -05:00
7933054638 Model sharing doc (#8498)
* Model sharing doc

* Style
2020-11-12 11:53:23 -05:00
d65e0bfea3 Fix doc bug (#8500)
* fix doc bug

Signed-off-by: mymusise <mymusise1@gmail.com>

* fix example bug

Signed-off-by: mymusise <mymusise1@gmail.com>
2020-11-12 11:47:23 -05:00
924c624a46 quick fix on concatenating text to support more datasets (#8474) 2020-11-12 09:47:08 -05:00
17b1fd804f Fix typo in roberta-base-squad2-v2 model card (#8489) 2020-11-12 05:29:37 -05:00
c6c08ebf61 [model_cards] other chars than [\w\-_] not allowed anymore in model names
cc @Pierrci
2020-11-12 10:45:29 +01:00
121c24efa4 Update deploy-docs dependencies on CI to enable Flax (#8475)
* Update deploy-docs dependencies on CI to enable Flax

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pair of ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-11-11 18:31:41 -05:00
81ebd70671 [s2s] distill t5-large -> t5-small (#8376)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-11 17:58:45 -05:00
a5b682329c Flax/Jax documentation (#8331)
* First addition of Flax/Jax documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* make style

* Ensure input order match between Bert & Roberta

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install dependencies "all" when building doc

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* wraps build_doc deps with ""

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing @sgugger comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use list to highlight JAX features.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's not look to much into the future for now.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-11 14:53:36 -05:00
c7b6bbec5c Skip test until investigation 2020-11-11 12:59:40 -05:00
aa2a2c6579 Replaced some iadd operations on lists with proper list methods. (#8433) 2020-11-11 12:29:57 -05:00
026a2ff225 Add TFDPR (#8203)
* Create modeling_tf_dpr.py

* Add TFDPR

* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot

last commit accidentally deleted these 4 lines, so I recover them back

* Add TFDPR

* Add TFDPR

* clean up some comments, add TF input-style doc string

* Add TFDPR

* Make return_dict=False as default

* Fix return_dict bug (in .from_pretrained)

* Add get_input_embeddings()

* Create test_modeling_tf_dpr.py

The current version is already passed all 27 tests!
Please see the test run at : 
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing

* fix quality

* delete init weights

* run fix copies

* fix repo consis

* del config_class, load_tf_weights

They shoud be 'pytorch only'

* add config_class back

after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion

* newline after .. note::

* import tf, np (Necessary for ModelIntegrationTest)

* slow_test from_pretrained with from_pt=True

At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug

* Add simple TFDPRModelIntegrationTest

Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet

* upload correct tf model

* remove position_ids as missing keys

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
2020-11-11 12:28:09 -05:00
a38d1c7c31 Example NER script predicts on tokenized dataset (#8468)
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
2020-11-11 10:28:23 -05:00
069b63844c Fix next sentence output (#8466) 2020-11-11 15:41:39 +01:00
da842e4e72 Add next sentence prediction loss computation (#8462)
* Add next sentence prediction loss computation

* Apply style

* Fix tests

* Add forgotten import

* Add forgotten import

* Use a new parameter

* Remove kwargs and use positional arguments
2020-11-11 15:02:06 +01:00
23290836c3 Fix TF Longformer (#8460) 2020-11-11 12:54:15 +01:00
8dda9167de [model_cards] harmonization 2020-11-11 12:42:50 +01:00
eb3bd73ce3 Bug fix for modeling utilities function: apply_chunking_to_forward, chunking should be in the chunking dimension, an exception was raised if the complete shape of the inputs was not the same rather than only the chunking dimension (#8391)
Co-authored-by: pedro <pe25171@mit.edu>
2020-11-10 21:33:11 +01:00
70708cca1a fix t5 token type ids (#8437) 2020-11-10 14:21:54 -05:00
9fd1f56236 [No merge] TF integration testing (#7621)
* stash

* TF Integration testing for ELECTRA, BERT, Longformer

* Trigger slow tests

* Apply suggestions from code review
2020-11-10 14:02:33 -05:00
8fe6629bb4 Add missing tasks to pipeline docstring (#8428) 2020-11-10 13:44:25 -05:00
02bdfc0251 using multi_gpu consistently (#8446)
* s|multiple_gpu|multi_gpu|g; s|multigpu|multi_gpu|g'

* doc
2020-11-10 13:23:58 -05:00
b93569457f fix t5 special tokens (#8435) 2020-11-10 18:54:17 +01:00
cace39af97 Add missing import (#8444)
* Add missing import

* Fix dummy objects
2020-11-10 18:01:32 +01:00
e21340da7a [testing utils] get_auto_remove_tmp_dir more intuitive behavior (#8401)
* [testing utils] get_auto_remove_tmp_dir default change

Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.

99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.

* simplify things

* update docs

* fix doc layout

* style

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* better 3-state doc

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* s/tmp/temporary/ + style

* correct the statement

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-10 11:57:21 -05:00
e7e1549895 Windows dev section in the contributing file (#8436)
* Add a Windows dev section in the contributing file.

* Forgotten link

* Trigger CI

* Rework description

* Trigger CI
2020-11-10 11:19:16 -05:00
8551a99232 Add auto next sentence prediction (#8432)
* Add auto next sentence prediction

* Fix style

* Add mobilebert next sentence prediction
2020-11-10 11:11:48 -05:00
c314b1fd3b [docs] improve bart/marian/mBART/pegasus docs (#8421) 2020-11-10 10:18:34 -05:00
3213d3bfae Question template (#8440)
* Remove SO from question template

* Styling
2020-11-10 10:07:56 -05:00
5d4972e608 [examples] better PL version check (#8429) 2020-11-10 09:33:23 -05:00
ae1cb4ec22 [s2s/distill] hparams.tokenizer_name = hparams.teacher (#8382) 2020-11-10 09:32:01 -05:00
aec51e5696 v3.5.0 documentation 2020-11-10 08:58:47 -05:00
818878dc88 Release: v3.5.0 2020-11-10 08:50:43 -05:00
9cebee38ad Model sharing rst (#8439)
* Update RST

* Finer details

* Re-organize

* Style
2020-11-10 08:35:11 -05:00
ad2303a401 Fix style 2020-11-10 14:28:30 +01:00
55e8d0cea2 Update links from s3 to huggingface.co 2020-11-10 14:03:29 +01:00
850afb422d Patch token classification pipeline (#8364)
* Patch token classification pipeline

* Some added tests for TokenClassificationArgumentHandler (#8366)

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2020-11-10 07:29:34 -05:00
70f622fab4 Model versioning (#8324)
* fix typo

* rm use_cdn & references, and implement new hf_bucket_url

* I'm pretty sure we don't need to `read` this file

* same here

* [BIG] file_utils.networking: do not gobble up errors anymore

* Fix CI 😇

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Tiny doc tweak

* Add doc + pass kwarg everywhere

* Add more tests and explain

cc @sshleifer let me know if better

Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>

* Also implement revision in pipelines

In the case where we're passing a task name or a string model identifier

* Fix CI 😇

* Fix CI

* [hf_api] new methods + command line implem

* make style

* Final endpoints post-migration

* Fix post-migration

* Py3.6 compat

cc @stefan-it

Thank you @stas00

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-10 07:11:02 -05:00
4185b115d4 Changing XLNet default from not using memories to 512 context size following paper (#8417)
* Move XLNet memory length FutureWarning

* isort

* style

* Changed default XLNet memory length
2020-11-09 20:49:51 -05:00
190df58560 [github CI] add a multi-gpu job for all example tests (#8341)
* add a multi-gpu job for all example tests

* run only ported tests

* rename

* explain why env is re-activated on each step

* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me

* style

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-09 15:47:38 -05:00
a39218b75b Check all models are in an auto class (#8425) 2020-11-09 15:44:54 -05:00
ef032ddd1e [docs] [testing] gpu decorators table (#8422)
* gpu decorators table

* whitespace

* Update docs/source/testing.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* whitespace

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-09 14:27:42 -05:00
a8339b9ecc Fix bart shape comment (#8423) 2020-11-09 13:25:33 -05:00
46509d1c19 [docs] remove sshleifer from issue-template :( (#8418) 2020-11-09 12:51:38 -05:00
9c83b96e62 [Tests] Add Common Test for Training + Fix a couple of bugs (#8415)
* add training tests

* correct longformer

* fix docs

* fix some tests

* fix some more train tests

* remove ipdb

* fix multiple edge case model training

* fix funnel and prophetnet

* clean gpt models

* undo renaming of albert
2020-11-09 18:24:41 +01:00
52040517b8 Deprecate old data/metrics functions (#8420) 2020-11-09 12:10:09 -05:00
d4d1fbfc5a [fsmt convert script] fairseq broke chkpt data - fixing that (#8377)
* fairseq broke chkpt data - fixing that

* style

* support older bpecodes filenames - specifically "code" in iwslt14
2020-11-09 11:57:42 -05:00
5c766ecb50 Fix typo 2020-11-09 11:50:51 -05:00
908a28894c Add new token classification example (#8340)
* Add new token classification example

* Remove txt file

* Add test

* With actual testing done

* Less warmup is better

* Update examples/token-classification/run_ner_new.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Fix test

* Make Lysandre happy

* Last touches and rename

* Rename in tests

* Address review comments

* More run_ner -> run_ner_old

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-11-09 11:39:55 -05:00
c7cb1aa26c Bump tokenizers (#8419) 2020-11-09 11:32:10 -05:00
78d706f3ae [fsmt tokenizer] support lowercase tokenizer (#8389)
* support lowercase tokenizer

* fix arg pos
2020-11-09 10:41:39 -05:00
1e2acd0dcf Bug fix for permutation language modelling (#8409) 2020-11-09 10:23:26 -05:00
bf8625e70b add evaluate doc - trainer.evaluate returns 'epoch' from training (#8273)
* add evaluate doc

* fix style with utils/style.doc

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-09 09:00:59 -05:00
ebde57acac examples/docs: caveat that PL examples don't work on TPU (#8309) 2020-11-09 08:55:22 -05:00
76e7a44dee Fix some tooling for windows (#8359)
* Fix some tooling for windows

* Fix conflict

* Trigger CI
2020-11-09 13:50:38 +01:00
507dfb40c3 Update README.md (#8406) 2020-11-09 16:44:43 +08:00
7247d0b4ea updating tag for exbert viz (#8408) 2020-11-09 16:43:55 +08:00
4ab5617b0b comet_ml temporary fix(#8410) 2020-11-09 16:36:06 +08:00
e6d9cdaafe [s2s/distill] remove run_distiller.sh, fix xsum script (#8412) 2020-11-08 16:57:43 -05:00
66582492d3 [s2s test_finetune_trainer] failing multigpu test (#8400) 2020-11-08 16:45:40 -05:00
f62755a600 [s2s examples test] fix data path (#8398) 2020-11-08 16:44:18 -05:00
4a53e8e9e4 Fix DataCollatorForWholeWordMask again (#8397) 2020-11-08 09:53:01 -05:00
610730998f fixed default labels for QA model (#8399) 2020-11-08 09:08:14 -05:00
0b02489b2c Add gpt2-medium-chinese model card (#8402)
* Create README.md

* Update model_cards/mymusise/gpt2-medium-chinese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-08 05:00:19 -05:00
187554366f fix md table (#8395) 2020-11-08 04:25:14 -05:00
77a257fc21 Fix DataCollatorForWholeWordMask (#8379)
* Fix DataCollatorForWholeWordMask

* Replace all tensorize_batch in data_collator.py
2020-11-07 12:51:56 -05:00
517eaf460b [make] rewrite modified_py_files in python to be cross-platform (#8371)
* rewrite modified_py_files in python to be cross-platform

* try a different way to test for variable not being ""

* improve comment
2020-11-07 18:45:16 +01:00
07708793f2 fix encoder outputs (#8368) 2020-11-06 21:03:25 +01:00
bc0d26d1de [All Seq2Seq model + CLM models that can be used with EncoderDecoder] Add cross-attention weights to outputs (#8071)
* Output cross-attention with decoder attention output

* Update src/transformers/modeling_bert.py

* add cross-attention for t5 and bart as well

* fix tests

* correct typo in docs

* add sylvains and sams comments

* correct typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-11-06 19:34:48 +01:00
30f2507a07 Update README.md (#8360)
Fix websitr address
2020-11-06 11:45:46 -05:00
5807ba3fa9 Fix typo (#8351) 2020-11-06 11:19:41 -05:00
82146496b6 Update README.md (#8338)
fixes
2020-11-06 06:20:58 -05:00
9e5c4d39ab Create README.md (#8312)
* Create README.md

* Update model_cards/ktrapeznikov/gpt2-medium-topic-news/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-06 06:19:59 -05:00
06ebc37967 Create README.md (#8255)
* Create README.md

Initial commit

* Updated Read me

Updated

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-06 03:34:24 -05:00
41cd031cf2 Create README.md (#8169) 2020-11-06 03:26:07 -05:00
f932ddeff5 Create README.md (#8170) 2020-11-06 03:25:52 -05:00
08b92f78fa Create README.md (#8168)
* Create README.md

* Update README.md
2020-11-06 03:25:33 -05:00
77d62e78b0 Create README.md (#8167)
* Create README.md

Telugu BERTU Readme file

* Update model_cards/kuppuluri/telugu_bertu/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-06 03:24:31 -05:00
dd6bfcaefb Create README.md (#8327) 2020-11-06 03:22:52 -05:00
ddeecf08e6 german medbert model details (#8266)
* model details

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-06 03:21:13 -05:00
96baaafd34 Create README.md (#8258) 2020-11-06 03:19:12 -05:00
185259c261 [model_cards] Update Italian BERT models and introduce new Italian XXL ELECTRA model 🎉 (#8343) 2020-11-06 03:17:03 -05:00
34bbf60bf8 Model card: GPT-2 fine-tuned on CommonGen (#8248) 2020-11-06 03:15:11 -05:00
973218fd3b Model card: CodeBERT fine-tuned for Insecure Code Detection (#8247)
* Model card: CodeBERT fine-tuned for Insecure Code Detection

* Update model_cards/mrm8488/codebert-base-finetuned-detect-insecure-code/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-06 03:13:45 -05:00
f833ca418b Model card: T5-base fine-tuned on QuaRel (#8334) 2020-11-06 03:09:55 -05:00
9edafaebef [s2s] test_bash_script.py - actually learn something (#8318)
* use decorator

* remove hardcoded paths

* make the test use more data and do real quality tests

* shave off 10 secs

* add --eval_beams 2, reformat

* reduce train size, use smaller custom dataset
2020-11-05 23:15:14 -05:00
17450397a7 Docs bart training ref (#8330)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-05 17:20:57 -05:00
d787935a14 [s2s] test_distributed_eval (#8315)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-05 16:01:15 -05:00
04e442d575 Make Trainer evaluation handle dynamic seq_length (#8336)
* Make Trainer evaluation handle dynamic seq_length

* Document behavior.

* Fix test

* Better fix

* Fixes for realsies this time

* Address review comments

* Without forgetting to save...
2020-11-05 15:13:51 -05:00
27b402cab0 Output global_attentions in Longformer models (#7562)
* Output global_attentions in Longformer models

* make style

* small refactoring

* fix tests

* make fix-copies

* add for tf as well

* remove comments in test

* make fix-copies

* make style

* add docs

* make docstring pretty

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-05 21:10:43 +01:00
7abc1d96d1 no warn (#8329) 2020-11-05 11:42:24 -05:00
52f44dd6d2 change TokenClassificationTask class methods to static methods (#7902)
* change TokenClassificationTask class methods to static methods

Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.

Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.

* Trigger Build

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-05 09:38:30 -05:00
77c8f6c627 Corrected typo in readme (#8320) 2020-11-05 07:48:36 -05:00
226b9debb7 Update PULL_REQUEST_TEMPLATE.md 2020-11-05 09:40:15 +01:00
6f35c61f93 Update bug-report.md 2020-11-05 09:39:05 +01:00
638c0b7c50 Create README.md (#8223)
* Create README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-05 03:03:19 -05:00
9c4aa4ac1a Clean up data collators and datasets (#8308)
* Clean up data collators and datasets

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove needless clone

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-04 17:24:49 -05:00
b1d3e95eb5 Fix path to old run_language_modeling.py script (#8302) 2020-11-04 13:17:57 -05:00
b6e58db277 Speedup doc build (#8301)
* Try -j option

* Try other thing

* Bigger machine

* Test lower sphinx version

* Remove trailing space
2020-11-04 11:51:21 -05:00
969ccac2e9 adding model cards for distilled models (#8300)
* adding model cards for distil models

* forgot the languages
2020-11-04 11:41:45 -05:00
7342d9a583 Improve QA pipeline error handling (#8286)
- The issue is that with previous code we would have the following:

```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```

The goal here is to improve this to actually return a ValueError
wherever possible.

While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.
2020-11-04 11:30:42 -05:00
38630e7a87 Update model cards of deepset/roberta-base-squad2 v1 and v2 (#8241)
* update deepset/roberta-base-squad2 to v2

* Update model_cards/deepset/roberta-base-squad2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-11-04 11:21:25 -05:00
04561ecbe6 Model card: T5-base fine-tuned on QASC (#8299) 2020-11-04 11:20:15 -05:00
854b44aa38 Revert size change as it doesn't change anything 2020-11-04 11:13:24 -05:00
414985c427 Upgrade resource for doc building 2020-11-04 10:44:19 -05:00
cf89724696 Fix validation file loading in scripts (#8298) 2020-11-04 10:42:18 -05:00
cb966e640b [Generate Test] fix greedy generate test (#8293)
* fix greedy generate test

* delet ipdb
2020-11-04 15:44:36 +01:00
734afa37f6 Fix typo in language-modeling README.md (#8287) 2020-11-04 09:38:02 -05:00
7a7e2c2606 [blenderbot] regex fix (#8282)
Fixing:

```
src/transformers/tokenization_blenderbot.py:163: DeprecationWarning: invalid escape sequence \s
    token = re.sub("\s{2,}", " ", token)
```
2020-11-04 09:02:28 -05:00
29b536a73a [WIP] Ner pipeline grouped_entities fixes (#5970)
* Bug fix: NER pipeline shouldn't group separate entities of same type

* style fix

* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
	(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
	The simplest fix is to just group the subwords with the first wordpiece.
	[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
	[TODO] handle different wordpiece_prefix ## ? possible approaches:
		get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
		have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO]  can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')

* use offset_mapping to fix [UNK] token problem

* ignore score for subwords

* modify ner_pipeline test

* modify ner_pipeline test

* modify ner_pipeline test

* ner_pipeline change ignore_subwords default to true

* add ner_pipeline ignore_subword=False test case

* fix offset_mapping index

* fix style again duh

* change is_subword and convert_tokens_to_string logic

* merge tests with new test structure

* change test names

* remove old tests

* ner tests for fast tokenizer

* fast tokenizers have convert_tokens_to_string

* Fix the incorrect merge

Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-11-03 17:21:04 -05:00
1bb4bba53c [CIs] Better reports everywhere (#8275)
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added

* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts

* add `pytest --make-reports` to all CIs (and artifacts)

* fix
2020-11-03 16:57:12 -05:00
7f556d2e39 Data collator for token classification (#8274)
* Add DataCollatorForTokenClassification and clean tests

* Make quality
2020-11-03 16:33:27 -05:00
6a064447f2 improve documentation of training_args.py (#8270)
* improve documentation of training_args.py

- do_train
- do_eval
- do_predict

* fix line too long

* fix style with black on training_args.py

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix line length with utils/style_doc

* black reformatting

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-03 15:57:17 -05:00
4c19f3baab Clean Trainer tests and datasets dep (#8268) 2020-11-03 15:50:55 -05:00
068e6b5edd make files independent (#8267) 2020-11-03 21:13:33 +01:00
cd360dcb26 [examples] minimal version requirement run-time check in PL (#8133)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-03 13:17:11 -05:00
971c638ee9 forward the worker stderr to the parent process (#8262) 2020-11-03 12:04:53 -05:00
eb6313e823 Fix Tatoeba skip 2020-11-03 10:35:00 -05:00
74f6f91a9d Updated ConversationalPipeline to work with encoder-decoder models (#8207)
* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)

* Addition of integration test for EncoderDecoder conversation model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-03 10:33:01 -05:00
c66ffa3a17 [FIX] TextGenerationPipeline is currently broken. (#8256)
* [FIX] TextGenerationPipeline is currently broken.

It's most likely due to #8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.

* Fixing Conversational tests too.
2020-11-03 10:10:22 -05:00
a1bbcf3f6c Refactoring the generate() function (#6949)
* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer
2020-11-03 16:04:22 +01:00
b63beb743c Skip tatoeba tests if Tatoeba-Challenge not cloned (#8260) 2020-11-03 09:49:29 -05:00
9f1747f999 [Seq2Seq] Correct import in Seq2Seq Trainer (#8254) 2020-11-03 07:56:41 -05:00
504ff7bb12 2 SinusoidalPositionalEmbedding fixes (#8226) 2020-11-02 18:50:26 -05:00
f744b81572 add new notebooks (#8246) 2020-11-02 20:21:55 +01:00
dc26726df2 fix encoder decoder bug (#8243) 2020-11-02 20:12:34 +01:00
9a23af4aff Add XLMProphetNetTokenizer to tokenization auto (#8245) 2020-11-02 14:10:09 -05:00
5b178f3c87 Create README.md 2020-11-02 20:03:44 +01:00
e1b1b614b1 Add line by line option to mlm/plm scripts (#8240)
* Make line by line optional in run_mlm

* Add option to disable dynamic padding

* Add option to plm too and update README

* Typos

* More typos

* Even more typos

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-11-02 12:27:04 -05:00
ebec410c71 Create README.md 2020-11-02 17:53:22 +01:00
5406f31a1a Fix TensorBoardCallback for older versions of PyTorch (#8239) 2020-11-02 10:43:28 -05:00
d1ad4bff44 Fix bad import with PyTorch <= 1.4.1 (#8237) 2020-11-02 10:26:37 -05:00
3c8d401cf6 Patch reports (#8238) 2020-11-02 10:26:25 -05:00
93354bc779 doc: fix typo (#8235) 2020-11-02 08:53:17 -05:00
0c92e7d9fa Fix ignore list behavior in doctests (#8213) 2020-11-02 08:47:37 -05:00
84caa23301 Fix the behaviour of DefaultArgumentHandler (removing it). (#8180)
* Some work to fix the behaviour of DefaultArgumentHandler by removing it.

* Fixing specific pipelines argument checking.
2020-11-02 12:33:50 +01:00
00cc2d1df2 DynaBERT model cards update (#8192)
* Update README.md

* Update README.md
2020-11-02 13:19:38 +08:00
aa79aa4e7d Added 12 model cards for Indian Language Models (#8198)
* Create README.md

* added model cards
2020-11-02 13:17:43 +08:00
9bd30f7cf4 [Seq2SeqTrainer] Move import to init to make file self-contained (#8194)
* boom boom

* reverse order
2020-11-01 23:31:55 +01:00
1f12934df4 [Bug fix] Fixed value for BlenderBot pad token (#8205) 2020-11-01 10:21:57 -05:00
8f1c960ee7 Fix two bugs with --logging_first_step (#8193)
* make sure that logging_first_step evaluates

* fix bug with incorrect loss on logging_first_step

* fix style

* logging_first_step only logs, not evals
2020-10-30 16:45:38 -04:00
689ff74f99 Minor style improvements for the Flax BERT and RoBERTa examples (#8178)
* Minor style improvements:

1. Use `@nn.compact` rather than `@compact` (as to not make it seem
   like compact is a standard Python decorator.
2. Move attribute docstrings from two `__call__` methods to comments
   on the attributes themselves. (This was probably a remnant from
   the pre-Linen version where the attributes were arguments to
   `call`.)

* Use black on the Flax modeling code
2020-10-30 16:25:39 -04:00
9eb3a410cd Remove deprecated arguments from new run_clm (#8197) 2020-10-30 15:27:20 -04:00
00112c3539 Replace swish with silu (#8166)
* Replace swish with silu

* revert nn.silu to nn.swish due to older version

* simplify optimized silu conditional and fix format

* Update activations.py

* Update activations_tf.py

* Update modeling_flax_utils.py

* Update modeling_openai.py

* add swish testcase

* add pytorch swish testcase

* Add more robust python version check

* more formatting fixes

Co-authored-by: TFUsers <TFUsers@gmail.com>
2020-10-30 15:09:10 -04:00
cdc48ce92d Finalize lm examples (#8188)
* Finish the cleanup of the language-modeling examples

* Update main README

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Propagate changes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-30 14:20:18 -04:00
089cc1015e Doc fixes and filter warning in wandb (#8189) 2020-10-30 12:37:34 -04:00
566b083eb1 TFMarian, TFMbart, TFPegasus, TFBlenderbot (#7987)
* Start plumbing

* Marian close

* Small stubs for all children

* Fixed bart

* marian working

* pegasus test is good, but failing

* Checkin tests

* More model files

* Subtle marian, pegasus integration test failures

* Works well

* rm print

* boom boom

* Still failing model2doc

* merge master

* Equivalence test failing, all others fixed

* cleanup

* Fix embed_scale

* Cleanup marian pipeline test

* Undo extra changes

* Smaller delta

* Cleanup model testers

* undo delta

* fix tests import structure

* cross test decorator

* Cleaner set_weights

* Respect authorized_unexpected_keys

* No warnings

* No warnings

* style

* Nest tf import

* black

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* functional dropout

* fixup

* Fixup

* style_doc

* embs

* shape list

* delete slow force_token_id_to_be_generated func

* fixup

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-30 11:23:16 -04:00
6279072f5f Fix typo: s/languaged/language/ (#8165) 2020-10-30 11:22:03 -04:00
10f8c63620 Ci test tf super slow (#8007)
* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model
2020-10-30 10:25:48 -04:00
7e36deec7a Fixing some warnings in DeBerta (#8176)
* Fixing some warnings in DeBerta

* Fixing docs with their rewritten version.
2020-10-30 09:15:41 -04:00
0538820737 [CI] Better reports #2 (#8163) 2020-10-29 19:30:05 -04:00
9a21b50614 Fix eval ref miss in Chinese WWM. (#8115)
* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

* fix eval ref file miss bug

* format file

* MOD: move ref code to contrib

* MOD: add delimeter check

* reformat code

* refomat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-29 17:08:39 -04:00
fdf893c441 Fix typo: indinces -> indices (#8159)
* Fix typo: indinces -> indices

* Fix some more

* Fix some more

* Fix some more

* Fix CI
2020-10-29 17:04:20 -04:00
c83cec44f8 improve error checking (#8157) 2020-10-29 14:05:24 -04:00
691176283d Add a template for examples and apply it for mlm and plm examples (#8153)
* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Add a template for example scripts and apply it to mlm

* Formatting

* Fix test

* Add plm script

* Styling
2020-10-29 13:38:11 -04:00
49e4fece5c [s2s] distillBART docs for paper replication (#8150) 2020-10-29 12:01:15 -04:00
acf56408d8 Smarter prediction loop and no- -> no_ in console args (#8151)
* Smarter prediction loop and no- -> no_ in console args

* Fix test
2020-10-29 10:56:25 -04:00
b0f1c0ee30 Document tokenizer_class in configurations (#8152) 2020-10-29 10:43:45 -04:00
969859d5f6 Fix doc errors and typos across the board (#8139)
* Fix doc errors and typos across the board

* Fix a typo

* Fix the CI

* Fix more typos

* Fix CI

* More fixes

* Fix CI

* More fixes

* More fixes
2020-10-29 10:33:33 -04:00
4731a00c3e Update widget examples. (#8149)
Co-authored-by: yantan <yantan@effyic.com>
2020-10-29 08:49:16 -04:00
238876068c Update README.md (#8090) 2020-10-29 08:31:32 -04:00
e566adc09c Add model_cards (#7969)
* add readme

* add readmes

* Add metadata
2020-10-29 08:29:54 -04:00
cc8941d881 Create README.md (#8089) 2020-10-29 08:23:43 -04:00
234a6dc388 Create README.md (#8088)
* Create README.md

* metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-29 08:23:30 -04:00
5d76859531 Create README.md (#8075)
* Create README.md

* Update model_cards/gurkan08/bert-turkish-text-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-29 08:22:33 -04:00
b215090eed Add two model_cards: ethanyt/guwenbert-base and ethanyt/guwenbert-large (#8041) 2020-10-29 08:21:54 -04:00
ba2ad3a98a Model Card for Gujarati-XLM-R-Base (#8038)
* Add model card for Gujarati-XLM-R-Base

* Update README.md

Add the model card for the Gujarati-XLM-R-Base.

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-29 08:21:11 -04:00
52cea7de75 Create README.md (#8017) 2020-10-29 08:19:47 -04:00
ff82a2aa93 Create README.md (#8015) 2020-10-29 08:19:35 -04:00
0a3b9733cb Add model_cards for DynaBERT (#8012)
* Update README.md

* Add dynabert_overview.png

* Update README.md

* Create README.md

* Add dynabert_overview.png

* Update README.md

* Update README.md

* Delete dynabert_overview.png

* Update README.md

* Delete dynabert_overview.png

* Update README.md
2020-10-29 08:19:17 -04:00
afa21504b1 add tags (#8147) 2020-10-29 12:45:55 +01:00
825925dfaa [s2s test] cleanup (#8131) 2020-10-28 16:50:36 -04:00
e477eb919f Fix typo in AutoModelForMaskedLM docs (#8129) 2020-10-28 15:52:28 -04:00
5e24982e58 Upgrade PyTorch Lightning to 1.0.2 (#7852)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-28 14:59:14 -04:00
1b6c8d4811 Update CI cache (#8126) 2020-10-28 13:59:43 -04:00
378142afdf Rename add_start_docstrings_to_callable (#8120) 2020-10-28 13:42:31 -04:00
6241c873cd Document the various LM Auto models (#8118) 2020-10-28 13:41:56 -04:00
5193172f12 [DOC] Improve pipeline() docstrings for config and tokenizer (#8123)
* Improve pipeline() docstrings

* make style

* Update wording for config
2020-10-28 13:26:12 -04:00
b4cacb7a63 fix(trainer_callback]: typo (#8121) 2020-10-28 12:15:30 -04:00
5423f2a9d4 [testing] port test_trainer_distributed to distributed pytest + TestCasePlus enhancements (#8107)
* move the helper code into testing_utils

* port test_trainer_distributed to work with pytest

* improve docs

* simplify notes

* doc

* doc

* style

* doc

* further improvements

* torch might not be available

* real fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-28 11:51:32 -04:00
47dfa65b0c New run_clm script (#8105)
* New run_clm script

* Formatting

* More comments

* Remove unused imports

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address review comments

* Change link to the hub

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-28 10:38:58 -04:00
8065fea870 [gh actions] run artifacts job always (#8110) 2020-10-28 01:45:19 -04:00
1e01db3579 Remove header 2020-10-27 17:36:13 -04:00
b715e40ced Fix typo 2020-10-27 17:34:05 -04:00
41cc5f3f59 Move installation instructions to the top (#8106) 2020-10-27 17:32:20 -04:00
556709ad92 rm multiclass option from model card 2020-10-27 17:11:43 -04:00
c5f3149f95 Adjust setup so that all extras run on Windows (#8102) 2020-10-27 14:39:49 -04:00
995006eabb Add AzureML in integrations via dedicated callback (#8062)
* first attempt to add AzureML callbacks

* func arg fix

* var name fix, but still won't fix error...

* fixing as in https://discuss.huggingface.co/t/how-to-integrate-an-azuremlcallback-for-logging-in-azure/1713/2

* Avoid lint check of azureml import

* black compliance

* Make isort happy

* Fix point typo in docs

* Add AzureML to Callbacks docs

* Attempt to make sphinx happy

* Format callback docs

* Make documentation style happy

* Make docs compliant to style

Co-authored-by: Davide Fiocco <davide.fiocco@frontiersin.net>
2020-10-27 14:21:54 -04:00
a0906068cf Fully remove codecov (#8093) 2020-10-27 14:14:13 -04:00
3e58b6b7b8 infer entailment label id on zero shot pipeline (#8059)
* add entailment dim argument

* rename dim -> id

* fix last name change, style

* rm arg, auto-infer only

* typo

* rm superfluous import
2020-10-27 14:09:55 -04:00
9fefdb0751 DEP: pinned sentencepiece to 0.1.91 in setup.py (#8069)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-27 14:09:31 -04:00
edd3721cd4 update/add setup targets (#8076) 2020-10-27 13:54:57 -04:00
55bc0c599a [model_cards] Switch to a more explicit domain for the media bucket 2020-10-27 18:08:05 +01:00
7bff0af0a4 Fix a bug for CallbackHandler.callback_list (#8052)
* Fix callback_list

* Add test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>

* Fix test

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
2020-10-27 10:37:04 -04:00
8e28c327fc Fix assertion error message for MLflowCallback (#8091) 2020-10-27 10:34:51 -04:00
3220f21f14 Styling fix 2020-10-27 10:09:51 -04:00
286dc19a4f Fix IterableDataset with __len__ in Trainer (#8095) 2020-10-27 09:52:35 -04:00
d93acd6f13 Move style_doc to extra_quality_checks (#8081) 2020-10-27 09:42:07 -04:00
bfd5e370a7 [CI] generate separate report files as artifacts (#7995)
* better reports

* a whole bunch of reports in their own files

* clean up

* improvements

* github artifacts experiment

* style

* complete the report generator with multiple improvements/fixes

* fix

* save all reports under one dir to easy upload

* can remove temp failing tests

* doc fix

* some cleanup
2020-10-27 09:25:07 -04:00
33f6ef733a Fix DeBERTa docs (#8092)
* Fix DeBERTa docs

* Tokenizer and config
2020-10-27 09:07:41 -04:00
c42596bc07 Doc styling fixes (#8074)
* Fix a few docstrings

* More fixes

* Styling
2020-10-27 07:54:50 -04:00
1496931b49 Fix comet_ml import and add ensure availability (#7933)
* Fix comet_ml import and add ensure availability

* Make isort happy

* Make flake8 happy

* Don't show comet_ml warn if COMET_MODE=DISABLED

* Make isort happy
2020-10-27 07:31:07 -04:00
985bba9096 fix doc bug (#8082)
Signed-off-by: mymusise <mymusise1@gmail.com>
2020-10-27 07:29:25 -04:00
08f534d2da Doc styling (#8067)
* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy
2020-10-26 18:26:02 -04:00
04a17f8550 Doc fixes in preparation for the docstyle PR (#8061)
* Fixes in preparation for doc styling

* More fixes

* Better syntax

* Fixes

* Style

* More fixes

* More fixes
2020-10-26 15:01:09 -04:00
8bbb74f211 [Model Card] new cross lingual sentence model for German and English (#8026)
* mc for new cross lingual sentence model

* fat text

* url spelling fix

* more url spelling fixes

* slight thanks change

* small improvements in text

* multilingual word xchange

* change colab link

* xval fold number

* add model links

* line break in model names

* Update README.md

* Update README.md

* new examples link

* new examples link

* add evaluation dataset name

* add more about multi lingual

* typo fix

* typo

* typos

* hyperparameter typos

* hyperparameter typo

* add metadata

* add metadata

* Update README.md

* typo fix

* Small improvement
2020-10-26 14:48:26 -04:00
3a10764574 Fix TF training arguments instantiation (#8063) 2020-10-26 14:39:25 -04:00
bc9332b545 [TF] from_pt should respect authorized_unexpected_keys (#8056) 2020-10-26 13:53:27 -04:00
7ff7c4934b fixing crash (#8057) 2020-10-26 13:19:10 -04:00
cbad90d86d Fix + Test (#8049) 2020-10-26 12:32:27 -04:00
664c7ec453 [Seq2Seq Trainer] Make sure padding is implemented for models without pad_token (#8043)
* make sure padding is implemented for non-padding tokens models as well

* add better error message

* add better warning

* remove results files

* Update examples/seq2seq/seq2seq_trainer.py

* remove unnecessary copy line

* correct usage of labels

* delete test files
2020-10-26 17:28:16 +01:00
098ddc2244 Update README.md (#8050)
--wwm cant be used as an argument given run_language_modeling.py and should be changed to --whole_word_mask
2020-10-26 12:00:18 -04:00
fbcddb8544 add mutliclass field to default zero shot example 2020-10-26 11:07:51 -04:00
a9ac1db276 Minor error fix of 'bart-large-cnn' details in the pretrained_models doc (#8053) 2020-10-26 11:05:16 -04:00
fc2d6eac3c Minor typo fixes to the preprocessing tutorial in the docs (#8046)
* Fix minor typos

Fix minor typos in the docs.

* Update docs/source/preprocessing.rst

Clearer data structure description.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-26 10:22:29 -04:00
b0a907615a minor model card description updates (#8051) 2020-10-26 10:04:20 -04:00
c48b16b8da Mlflow integration callback (#8016)
* Add MLflow integration class

Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.

* Add MLflowCallback import

Add import of MLflowCallback in trainer.py

* Handle model argument

Allow the callback to handle model argument and store model config items as hyperparameters.

* Log parameters to MLflow in batches

MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.

* Fix style

* Add docs on MLflow callback

* Fix issue with unfinished runs

The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.

* Add MLflow integration class

Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.

* Add MLflowCallback import

Add import of MLflowCallback in trainer.py

* Handle model argument

Allow the callback to handle model argument and store model config items as hyperparameters.

* Log parameters to MLflow in batches

MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.

* Fix style

* Add docs on MLflow callback

* Fix issue with unfinished runs

The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.
2020-10-26 09:41:58 -04:00
8be9cb0aef Tiny TF Bart fixes (#8023) 2020-10-26 09:29:56 -04:00
077478637d Fix label name in DataCollatorForNextSentencePrediction test (#8048) 2020-10-26 09:23:12 -04:00
8bbe8247f1 Cleanup pytorch tests (#8033) 2020-10-26 08:59:06 -04:00
20a0894d1a update version for scipy (#7998) 2020-10-26 08:56:56 -04:00
f20aec1de5 fsmt slow test uses lists (#8031) 2020-10-26 08:32:36 -04:00
101186bc1f [docs] [testing] distributed training (#7993)
* distributed training

* fix

* fix formatting

* wording
2020-10-26 08:15:05 -04:00
c153bcc5c8 Add mixed precision evaluation (#8036)
* Add mixed precision evaluation

* use original flag
2020-10-26 08:12:31 -04:00
9aa2826687 Minor typo fixes to the tokenizer summary (#8045)
Minor typo fixes to the tokenizer summary
2020-10-26 08:08:33 -04:00
829b9f8cc3 Remove codecov.yml 2020-10-26 08:05:02 -04:00
79eb391586 [tokenizers] Fixing #8001 - Adding tests on tokenizers serialization (#8006)
* fixing #8001

* make T5 tokenizer serialization more robust - style
2020-10-26 10:27:48 +01:00
7087d9b1c0 [model_cards] bert-base-danish Fixup
#8030
2020-10-26 09:38:21 +01:00
efc4a21ffa Fixup #8025
Close #8030
2020-10-26 09:32:07 +01:00
5148f43309 [Model Card] DJSammy/bert-base-danish-uncased_BotXO,ai (#8025)
* Create README.md

* Update README.md
2020-10-25 15:20:46 +08:00
38f6739cd6 [doc prepare_seq2seq_batch] fix docs (#8013) 2020-10-24 15:33:47 -04:00
00602f7840 Create model card for pre-trained NLI models. (#7864)
* Create README.md

* Update model_cards/ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Add Meta information for dataset identifier.

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-24 03:16:07 -04:00
3c682ea15c [Examples] Allow EncoderDecoderModels to be trained with Seq2Seq (#7809)
* Make Seq2Seq Trainer more similar to Trainer

* fix typo

* fix seq2seq trainer

* remove from tests

* remove lock

* remove train files

* delete test files

* correct typo

* check at init

* make sure trainer is not slowed down on TPU

* correct isort

* remove use cache

* fix use cache

* add last use chache = false
2020-10-23 23:05:51 +02:00
59b5953d89 Create model card for bert-italian-cased-finetuned-pos (#8003)
* Create README.md

* Update model_cards/sachaarbonel/bert-italian-cased-finetuned-pos/README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-23 10:58:05 -04:00
6e07c1f446 Add model cards for DynaBERT (#7999) 2020-10-23 10:53:53 -04:00
43fdafef89 Create README.md (#7997) 2020-10-23 10:53:37 -04:00
627e813734 Added model cards for Tagalog ELECTRA models (#7996)
Co-authored-by: Jan Christian Blaise Cruz <jcblaise@Blaises-MacBook-Pro.local>
2020-10-23 10:52:21 -04:00
9865e1fe52 model card for German Sentence Embeddings V2 (#7952)
* model card German Sentence Embeddings V2

- for German RoBERTa for Sentence Embeddings V2
- marked old as outdated

* small correction

* small improvement in description

* small spelling fix

* spelling fix

* add evaluation results

* spearman explanation

* add number of trials
2020-10-23 10:45:54 -04:00
d39da5a2ab Handling longformer model_type (#7990)
Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.

This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py
2020-10-23 10:34:06 -04:00
5e323017a4 Fix BatchEncoding.word_to_tokens for removed tokens (#7939) 2020-10-23 10:29:37 -04:00
4acfd1a8dc [Reformer] remove reformer pad_token_id (#7991)
* remove reformer pad_token_id

* fix pegasus
2020-10-23 10:29:15 -04:00
3a40cdf58d [tests|tokenizers] Refactoring pipelines test backbone - Small tokenizers improvements - General tests speedups (#7970)
* WIP refactoring pipeline tests - switching to fast tokenizers

* fix dialog pipeline and fill-mask

* refactoring pipeline tests backbone

* make large tests slow

* fix tests (tf Bart inactive for now)

* fix doc...

* clean up for merge

* fixing tests - remove bart from summarization until there is TF

* fix quality and RAG

* Add new translation pipeline tests - fix JAX tests

* only slow for dialog

* Fixing the missing TF-BART imports in modeling_tf_auto

* spin out pipeline tests in separate CI job

* adding pipeline test to CI YAML

* add slow pipeline tests

* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/pipelines.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add require_torch and require_tf in is_pt_tf_cross_test

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-23 15:58:19 +02:00
88b3a91e61 Handle the case when title is None (#7941) 2020-10-23 15:54:45 +02:00
023f0f3708 [s2s trainer] tests to use distributed on multi-gpu machine (#7965) 2020-10-22 17:26:22 -04:00
64b24bb3c2 change zero shot widget default example (#7992) 2020-10-22 15:19:41 -06:00
0397619ac6 Move NoLayerEmbedTokens (#7945)
* Move NoLayerEmbedTokens

* TFWrappedEmbeddings

* Add comment
2020-10-22 16:13:49 -04:00
5ac07513e0 [gh ci] less output ( --durations=50) (#7989) 2020-10-22 16:10:15 -04:00
5ae935d233 Reload checkpoint (#7984)
* Fix checkpoint loading in Trainer

* Fix typo
2020-10-22 15:48:52 -04:00
467573ddde Fix documentation redirect 2020-10-22 15:37:51 -04:00
077c99bb5f add zero shot pipeline tags & examples (#7983)
* add zero shot pipeline tags

* rm default and fix yaml format

* rm DS_Store

* add bart large default

* don't add more typos

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* add multiple multilingual examples

* improve multilingual examples for single-label

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-22 13:01:23 -06:00
06fc3954a1 Only log total_flos at the end of training (#7981)
* Only log total_flos at the end of training

* Fix test
2020-10-22 14:26:55 -04:00
ff65beafa3 FillMaskPipeline: support passing top_k on __call__ (#7971)
* FillMaskPipeline: support passing top_k on __call__

Also move from topk to top_k

* migrate to new param name in tests

* Review from @sgugger
2020-10-22 12:54:25 -04:00
2e5052d4f1 New run glue script (#7917)
* Start simplification

* More progress

* Finished script

* Address comments and update tests instructions

* Wrong test

* Accept files as inputs and fix test

* Update src/transformers/trainer_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Fix labels and add combined score

* Add special labels

* Update TPU command

* Revert to old label strategy

* Use model labels

* Fix for STT-B

* Styling

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Code styling

* Fix review comments

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-22 11:42:22 -04:00
18ce6b8ff3 Fixing the "translation", "translation_XX_to_YY" pipelines. (#7975)
* Actually make the "translation", "translation_XX_to_YY" task behave correctly.

Background:
- Currently "translation_cn_to_ar" does not work. (only 3 pairs are
supported)
- Some models, contain in their config the correct values for the (src,
tgt) pair they can translate. It's usually just one pair, and we can
infer it automatically from the `model.config.task_specific_params`. If
it's not defined we can still probably load the TranslationPipeline
nevertheless.

Proposed fix:
- A simplified version of what could become more general which is
a `parametrized` task. "translation" + (src, tgt) in this instance
it what we need in the general case. The way we go about it for now
is simply parsing "translation_XX_to_YY". If cases of parametrized task arise
we should preferably go in something closer to what `datasets` propose
which is having a secondary argument `task_options`? that will be close
to what that task requires.
- Should be backward compatible in all cases for instance
`pipeline(task="translation_en_to_de") should work out of the box.
- Should provide a warning when a specific translation pair has been
selected on behalf of the user using
`model.config.task_specific_params`.

* Update src/transformers/pipelines.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-22 17:16:21 +02:00
901e9b8eda Remove the else branch adding 0 to the hidden state if token_type_embeds is None. (#7977)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-10-22 16:41:41 +02:00
f34372a9ff [PretrainedConfig] Fix save pretrained config for edge case (#7943)
* fix config save

* add test

* add config class variable and another test

* line break

* fix fsmt and typo

* god am I making many errors today :-/

* Update src/transformers/configuration_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-22 15:39:01 +02:00
cc2e312ca3 adding text classification with DistilBERT/tf notebook (#7964)
Looking at the current community notebooks, it seems that few are targeted for absolute beginners and even fewer are written with TensorFlow. This notebook describes absolutely everything a beginner would need to know, including how to save/load their model and use it for new predictions (this is often omitted in tutorials)

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-22 09:30:50 -04:00
a16e568f22 # Add whole word mask support for lm fine-tune (#7925)
* ADD: add whole word mask proxy for both eng and chinese

* MOD: adjust format

* MOD: reformat code

* MOD: update import

* MOD: fix bug

* MOD: add import

* MOD: fix bug

* MOD: decouple code and update readme

* MOD: reformat code

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update examples/language-modeling/run_language_modeling.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change wwm to whole_word_mask

* reformat code

* reformat

* format

* Code quality

* ADD: update chinese ref readme

* MOD: small changes

* MOD: small changes2

* update readme

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-22 09:19:00 -04:00
64b4d25cf3 [fsmt test] basic config test with online model + super tiny model (#7860)
* basic config test with online model

* typo

* style

* better test
2020-10-22 09:14:54 -04:00
3479787edc Disable inference API for t5-11b (#7978) 2020-10-22 09:08:37 -04:00
a7db81c33f [model_card] t5-11b move disclaimer to top of page
cc @Narsil @patrickvonplaten
2020-10-22 14:35:31 +02:00
f774b2e8c4 support relative path for best_model_checkpoint (#7973) 2020-10-22 07:55:31 -04:00
8348105692 [testing] slow tests should be marked as slow (#7895)
* slow tests should be slow

* exception note

* style

* integrate LysandreJik's notes with some expansions

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* another slow test

* fix link, and prose

* clarify.

* note from Sam

* typo

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-22 06:34:05 -04:00
95792a948e Herbert tokenizer auto load (#7968) 2020-10-22 05:48:29 -04:00
4abb7ffc18 added qg evaluation notebook (#7958)
* added qg evaluation notebook

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-22 11:02:12 +02:00
8b38173398 [seq2seq testing] multigpu test run via subprocess (#7281)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-21 17:20:53 -04:00
f8d3695e8c [model_cards] camembert: dataset = oscar
Hat/tip @pjox
2020-10-21 14:17:56 -04:00
16da877139 fix 'encode_plus' docstring for 'special_tokens_mask' (0s and 1s were reversed) (#7949)
* fix docstring for 'special_tokens_mask'

* revert auto formatter changes

* revert another auto format

* revert another auto format
2020-10-21 13:57:44 -04:00
52decab371 fix test (#7947) 2020-10-21 19:06:23 +02:00
9b6610f7f6 [ProphetNet] Correct Doc string example (#7944)
* correct xlm prophetnet auto model and examples

* fix line-break docs
2020-10-21 17:27:20 +02:00
e174bfeb34 TensorBoard/Wandb/optuna/raytune integration improvements. (#7935)
Improved TensorBoard and Wandb integration, as well as optuna and ray/tune support, with minor modifications to trainer core code.
2020-10-21 17:18:52 +02:00
bf162ce8ca Add AI-SOCO models (#7867) 2020-10-21 09:24:43 -04:00
58fb25f25b Create README.md (#7857)
* Create README.md

model card for cambridgeltl/BioRedditBERT-uncased.

* Update model_cards/cambridgeltl/BioRedditBERT-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-21 08:41:41 -04:00
2b07ec7823 Model card for German BERT fine-tuned for LER/NER (#7855) 2020-10-21 08:31:41 -04:00
35d2ad5b83 Create README.md (#7819) 2020-10-21 08:30:01 -04:00
bdda4f2249 Create README.md (#7625)
* Create README.md

* Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md

* Update model_cards/lanwuwei/GigaBERT-v3-Arabic-and-English/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-21 08:29:39 -04:00
8e23749649 Add missing comma (#7870) 2020-10-21 08:24:12 -04:00
3eaa007d78 Create README.md (#7899) 2020-10-21 08:23:55 -04:00
758572cad8 [model_cards] move hatmimoha/arabic-ner to correct location
see 16d3cc187d and https://github.com/huggingface/transformers/pull/7836
2020-10-21 14:13:17 +02:00
57516c0cc8 [multiple models] skip saving/loading deterministic state_dict keys (#7878)
* make the save_load special key tests common

* handle mbart

* cleaner solution

* fix

* move test_save_load_missing_keys back into fstm for now

* restore

* style

* add marian

* add pegasus

* blenderbot

* revert - no static embed
2020-10-21 08:06:07 -04:00
006a16483f update model cards of Illuin models (#7930) 2020-10-21 08:05:53 -04:00
16d3cc187d model card for arabic-ner model (#7836)
* Create README.md

README file for the Arabic NER model

* Update README.md

* Update README.md

* Update hatmimoha/arabic-ner/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-21 08:02:40 -04:00
829842159e Add TFBartForConditionalGeneration (#5411)
* half done

* doc improvement

* Cp test file

* brokedn

* broken test

* undo some mess

* ckpt

* borked

* Halfway

* 6 passing

* boom boom

* Much progress but still 6

* boom boom

* merged master

* 10 passing

* boom boom

* Style

* no t5 changes

* 13 passing

* Integration test failing, but not gibberish

* Frustrated

* Merged master

* 4 fail

* 4 fail

* fix return_dict

* boom boom

* Still only 4

* prepare method

* prepare method

* before delete classif

* Skip tests to avoid adding boilerplate

* boom boom

* fast tests passing

* style

* boom boom

* Switch to supporting many input types

* remove FIXMENORM

* working

* Fixed past_key_values/decoder_cached_states confusion

* new broken test

* Fix attention mask kwarg name

* undo accidental

* Style and reviewers

* style

* Docs and common tests

* Cleaner assert messages

* copy docs

* style issues

* Sphinx fix

* Simplify caching logic

* test does not require torch

* copy _NoLayerEmbedTokens

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update tests/test_modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_bart.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Line length and dont document None

* Add pipeline test coverage

* assert msg

* At parity

* Assert messages

* mark slow

* Update compile test

* back in init

* Merge master

* Fix tests

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-21 13:10:16 +02:00
5cd9e2cba1 Update README.md 2020-10-21 12:43:42 +02:00
220b5f97ca Create README.md 2020-10-21 12:34:46 +02:00
8ffd7fb12d Update README.md 2020-10-21 12:27:09 +02:00
613ab364eb Update README.md 2020-10-21 12:23:17 +02:00
f7eb17dc47 Update README.md 2020-10-21 12:19:44 +02:00
29792864cb [ProphetNet] Add Question Generation Model + Test (#7942)
* new prophetnet model

* correct name

* make style
2020-10-21 11:49:58 +02:00
13842e413c PPL guide minor code snippet fix (#7938) 2020-10-20 16:17:39 -06:00
0e24e4c136 [s2s] create doc for pegasus/fsmt replication (#7934) 2020-10-20 15:07:52 -04:00
96f4828ace Respect the 119 line chars (#7928) 2020-10-20 11:02:47 -04:00
ef0ac063c9 Docs for v3.4.0 2020-10-20 16:29:00 +02:00
eb0e0ce2ad Release: v3.4.0 2020-10-20 16:22:26 +02:00
0264048660 Update README.md 2020-10-20 16:13:49 +02:00
ffd675b42c add summary (#7927) 2020-10-20 10:11:02 -04:00
5547b40b13 labels and decoder_input_ids to Glossary (#7906)
* labels and decoder_input_ids to Glossary

* Formatting fixes

* Update docs/source/glossary.rst

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* sam's comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-20 09:50:47 -04:00
f3312515b7 Add note for WikiSplit 2020-10-20 15:42:29 +02:00
0724c0f3a2 Fix EncoderDecoder WikiSplit Example 2020-10-20 15:13:22 +02:00
ca37db0559 [flax] fix repo_check (#7914)
* [flax] fix repo_check

Unless, this is actually a problem, this adds `modeling_flax_utils` to ignore list. otherwise currently it expects to have a 'tests/test_modeling_flax_utils.py' for it.
for context please see: https://github.com/huggingface/transformers/pull/3722#issuecomment-712360415

* fix 2 more issues

* merge https://github.com/huggingface/transformers/pull/7919/
2020-10-20 07:55:40 -04:00
048dd6cf10 Fix bug in _sorted_checkpoints (#7880)
I'm using transformers 3.3.1 and run a training script with `--save_total_limit 3`. I hit the exception below, and after debugging the code found that it wrongly tries to index into the `best_model_checkpoint`'s *str* rather than the `sorted_checkpoints` array. When running without the fix I got this exception:

```
Traceback (most recent call last):
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 921, in _save_training
    self._rotate_checkpoints(use_mtime=True)
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1283, in _rotate_checkpoints
    checkpoints_sorted = self._sorted_checkpoints(use_mtime=use_mtime)
  File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1274, in _sorted_checkpoints
    checkpoints_sorted[best_model_index],
TypeError: 'str' object does not support item assignment
```
2020-10-20 07:50:47 -04:00
6d4f8bd02a Add Flax dummy objects (#7918) 2020-10-20 07:45:48 -04:00
3e31e7f956 [testing] rename skip targets + docs (#7863)
* rename skip targets + docs

* fix quotes

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* small improvements

* fix

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-20 04:39:13 -04:00
c912ba5f69 [EncoderDecoder] Fix Typo (#7915)
* fix encoder decoder models

* add .gitignore
2020-10-19 22:02:42 +02:00
55bcd0cb59 Raise error when using AMP on non-CUDA device (#7869)
* Raise error when using AMP on non-CUDA device

* make style

* make style
2020-10-19 15:59:30 -04:00
e3d2bee8d0 fix t5 training docstring (#7911) 2020-10-19 21:49:47 +02:00
df1ddcedf2 decoder_config used before intialisation (#7903)
Seeing error when sending `decoder_config` as a parameter while initializing a encoder-decoder model from pretrained. 
fixed "UnboundLocalError: local variable 'decoder_config' referenced before assignment"
2020-10-19 19:48:49 +02:00
033f29c625 Allow Custom Dataset in RAG Retriever (#7763)
* add CustomHFIndex

* typo in config

* update tests

* add custom dataset example

* clean script

* update test data

* minor in test

* docs

* docs

* style

* fix imports

* allow to pass the indexed dataset directly

* update tests

* use multiset DPR

* address thom and patrick's comments

* style

* update dpr tokenizer

* add output_dir flag in use_own_knowledge_dataset.py

* allow custom datasets in examples/rag/finetune.py

* add test for custom dataset in distributed rag retriever
2020-10-19 19:42:45 +02:00
a09fe140c1 Trainer with Iterable Dataset (#7858)
* fix 5990

* accomodate iterable dataset without predefined length
* set it as 1 use case: provide max_steps, and NO num_epochs
* Is a merge of master and PR 5995

* fix trainer test under TF

* fix only for torch
* TF trainer untouched
* trainer tests are skipped when no torch

* address comments

* fix quality checks

* remove torch.dataset from test_trainer

* unnecessary inheritance
* RegressionDataset implements all needed methods __len__ and __getitem__

* fix quality checks

* restore RegressionDataset

* was wrongly under is_torch_available()
2020-10-19 11:57:39 -04:00
2422cda01b ProphetNet (#7157)
* add new model prophetnet

prophetnet modified

modify codes as suggested v1

add prophetnet test files

* still bugs, because of changed output formats of encoder and decoder

* move prophetnet into the latest version

* clean integration tests

* clean tokenizers

* add xlm config to init

* correct typo in init

* further refactoring

* continue refactor

* save parallel

* add decoder_attention_mask

* fix use_cache vs. past_key_values

* fix common tests

* change decoder output logits

* fix xlm tests

* make common tests pass

* change model architecture

* add tokenizer tests

* finalize model structure

* no weight mapping

* correct n-gram stream attention mask as discussed with qweizhen

* remove unused import

* fix index.rst

* fix tests

* delete unnecessary code

* add fast integration test

* rename weights

* final weight remapping

* save intermediate

* Descriptions for Prophetnet Config File

* finish all models

* finish new model outputs

* delete unnecessary files

* refactor encoder layer

* add dummy docs

* code quality

* fix tests

* add model pages to doctree

* further refactor

* more refactor, more tests

* finish code refactor and tests

* remove unnecessary files

* further clean up

* add docstring template

* finish tokenizer doc

* finish prophetnet

* fix copies

* fix typos

* fix tf tests

* fix fp16

* fix tf test 2nd try

* fix code quality

* add test for each model

* merge new tests to branch

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update model_cards/microsoft/prophetnet-large-uncased-cnndm/README.md

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_prophetnet.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update utils/check_repo.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* apply sams and sylvains comments

* make style

* remove unnecessary code

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/configuration_prophetnet.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* implement lysandres comments

* correct docs

* fix isort

* fix tokenizers

* fix copies

Co-authored-by: weizhen <weizhen@mail.ustc.edu.cn>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 17:36:09 +02:00
8f8f8d99fc Integrate Bert-like model on Flax runtime. (#3722)
* WIP flax bert

* Initial commit Bert Jax/Flax implementation.

* Embeddings working and equivalent to PyTorch.

* Move embeddings in its own module BertEmbeddings

* Added jax.jit annotation on forward call

* BertEncoder on par with PyTorch ! :D

* Add BertPooler on par with PyTorch !!

* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.

* Fix pooled output to take only the first token of the sequence.

* Refactoring to use BertConfig from transformers.

* Renamed FXBertModel to FlaxBertModel

* Model is now initialized in FlaxBertModel constructor and reused.

* WIP JaxPreTrainedModel

* Cleaning up the code of FlaxBertModel

* Added ability to load Flax model saved through save_pretrained()

* Added ability to convert Pytorch Bert model to FlaxBert

* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion

* Fix hardcoded shape values in conversion scripts.

* Improve the way we handle LayerNorm conversion from PyTorch to Flax.

* Added positional embeddings as parameter of BertModel with default to np.arange.

* Let's roll FlaxRoberta !

* Fix missing position_ids parameters on predict for Bert

* Flax backend now supports batched inputs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make it possible to load msgpacked model on convert from pytorch in last resort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Moved save_pretrained to Jax base class along with more constructor parameters.

* Use specialized, model dependent conversion functio.

* Expose `is_flax_available` in file_utils.

* Added unittest for Flax models.

* Added run_tests_flax to the CI.

* Introduce FlaxAutoModel

* Added more unittests

* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.

* Addressing review comments.

* Expose seed in both Bert and Roberta

* Fix typo suggested by @stefan-it

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Attempt to make style

* Attempt to make style in tests too

* Added jax & jaxlib to the flax optional dependencies.

* Attempt to fix flake8 warnings ...

* Redo black again and again

* When black and flake8 fight each other for a space ... 💥 💥 💥

* Try removing trailing comma to make both black and flake happy!

* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉

* Fix another invalid import in flax_roberta test

* Bump and pin flax release to 0.1.0.

* Make flake8 happy, remove unused jax import

* Change the type of the catch for msgpack.

* Remove unused import.

* Put seed as optional constructor parameter.

* trigger ci again

* Fix too much parameters in BertAttention.

* Formatting.

* Simplify Flax unittests to avoid machine crashes.

* Fix invalid number of arguments when raising issue for an unknown model.

* Address @bastings comment in PR, moving jax.jit decorated outside of __call__

* Fix incorrect path to require_flax/require_pytorch functions.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct rebasing of circle-ci dependencies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Again import sorting...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Installing missing nlp dependency for flax unittests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix laoding of model for Flax implementations.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* jit the inner function call to make JAX-compatible

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format !

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Flake one more time 🎶

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrites BERT in Flax to the new Linen API (#7211)

* Rewrite Flax HuggingFace PR to Linen

* Some fixes

* Fix tests

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* Expose `is_flax_available` in file_utils.

* Added run_tests_flax to the CI.

* Attempt to make style

* trigger ci again

* Fix import sorting.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"

This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.

* Remove jnp.lax references

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Reintroduce Linen changes ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use jax native's gelu function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused variable in BertModule again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to have is_flax_available working again.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Introduce JAX TensorType

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve ImportError message when trying to convert to various TensorType format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Makes Flax model jittable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure flax models are jittable in unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Ensure jax imports are guarded behind is_flax_available.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style again again again

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update src/transformers/file_utils.py

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump flax to it's latest version

Co-authored-by: Marc van Zee <marcvanzee@gmail.com>

* Bump jax version to at least 0.2.0

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update the unittest to use TensorType.JAX

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* isort import in tests.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Match new flax parameters name "params"

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add flax models to transformers __init__

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Attempt to address all CI related comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct circle.yml indent (2)

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove coverage from flax tests

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing many naming suggestions from comments

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify for loop logic to interate over layers in FlaxBertLayerCollection

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use f-string syntax for formatting logs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use config property from FlaxPreTrainedModel.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "cls_token" instead of "first_token" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* use "hidden_state" instead of "h" variable name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct class reference in docstring to link to Flax related modules.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added HF + Google Flax team copyright.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make Roberta independent from Bert

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Move activation functions to flax_utils for bert.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added docstring for BERT

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Update import for Bert and Roberta tokenizers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix-copies

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Correct FlaxRobertaLayer to match PyTorch.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use the same store_artifact for flax unittest

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Style.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure gradient are disabled only locally for flax unittest using torch equivalence.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use relative imports

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-19 09:55:41 -04:00
0193c8290d [RAG] Propagating of n_docs as parameter to all RagModel's related functions (#7891)
* Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs

* Making n_docs parameter's default value to None in marginalize function

* Fixing code quality issues

* Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter

* T5PreTrainedModel do not have n_docs as parameter

* Addressing review comment

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Correcting comment by addressing review comment

* Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.

* Fixing flake8 reported issue

* Correcting test datasets for rag

* Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null

* doc_scores second dimension have number of retrieved docs

* Changing assert comment

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-19 15:15:52 +02:00
7e6b6fbec9 style: fix typo in the README (#7882) 2020-10-19 08:43:25 -04:00
805a202e1a [CIs] report slow tests add --durations=0 to some pytest jobs (#7884)
* add --durations=50 to some pytest runs

* report all tests
2020-10-19 08:23:14 -04:00
4eb61f8e88 remove USE_CUDA (#7861) 2020-10-19 07:08:34 -04:00
ea1507fb45 Julibert model card (#7868)
* Julibert model card

* Fix text
2020-10-19 06:50:52 -04:00
7c44c864a5 style: fix typo (#7883) 2020-10-19 06:14:53 -04:00
776e82d2be Add support to provide initial tokens to decoder of encoder-decoder type models (#7577)
* Add support to provide initial tokens for decoding

* Add docstring

* improve code quality

* code reformat

* code reformat

* minor change

* remove appending decoder start token

Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
2020-10-19 08:56:08 +02:00
406a49dfe4 Fix small type hinting error (#7820)
* Fix small type hinting error

* Update tokenization_utils_base.py

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-19 08:14:29 +02:00
b86a71ea38 [tests] fix slow bart cnn test, faster marian tests (#7888) 2020-10-18 20:18:08 -04:00
ba8c4d0ac0 [Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
c65863ce53 Remove duplicated mish activation function (#7856)
* Remove duplicated mish activation function

* Update activations.py
2020-10-17 17:31:53 -04:00
f5c45a19e6 Fix Rag example docstring (#7872)
* fix rag examples

* fix token generate example
2020-10-17 22:46:47 +02:00
9f7b2b2432 [s2s testing] turn all to unittests, use auto-delete temp dirs (#7859) 2020-10-17 14:33:21 -04:00
dc552b9b70 Fix typo in sequence model card 2020-10-16 16:05:06 +02:00
1652ddad35 [seq2seq testing] improve readability (#7845) 2020-10-16 09:05:29 -04:00
466115b279 Fix missing reference titles in retrieval evaluation of RAG (#7817) 2020-10-16 10:15:49 +02:00
464b53f5e4 [testing] disable FutureWarning in examples tests (#7842)
* [testing] disable FutureWarning in examples tests

same as tests/conftest.py, we can't resolve those warning, so turn the noise off.

* fix
2020-10-16 03:35:39 -04:00
eb186bc14e Small fixes to HP search (#7839) 2020-10-16 03:23:44 -04:00
d8ca57d2ce fix/hide warnings (#7837)
s
2020-10-16 03:19:51 -04:00
c6e865ac2b Remove masked_lm_labels from returned dictionary (#7818) 2020-10-16 03:12:10 -04:00
96e47d9229 [cleanup] assign todos, faster bart-cnn test (#7835)
* 2 beam output

* unassign/remove TODOs

* remove one more
2020-10-16 03:11:18 -04:00
7b13bd01df Herbert polish model (#7798)
* HerBERT transformer model for Polish language understanding.

* HerbertTokenizerFast generated with HerbertConverter

* Herbert base and large model cards

* Herbert model cards with tags

* Herbert tensorflow models

* Herbert model tests based on Bert test suit

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* src/transformers/tokenization_herbert.py edited online with Bitbucket

* docs/source/model_doc/herbert.rst edited online with Bitbucket

* Herbert tokenizer tests and bug fixes

* src/transformers/configuration_herbert.py edited online with Bitbucket

* Copyrights and tests for TFHerbertModel

* model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket

* model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket

* Bug fixes after testing

* Reformat modified_only_fixup

* Proper order of configuration

* Herbert proper documentation formatting

* Formatting with make modified_only_fixup

* Dummies fixed

* Adding missing models to documentation

* Removing HerBERT model as it is a simple extension of BERT

* Update model_cards/allegro/herbert-base-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/allegro/herbert-large-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* HerbertTokenizer deprecated configuration removed

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-16 03:06:51 -04:00
99898dcd27 [Pipelines] Fix links to model lists (#7826) 2020-10-16 02:57:02 -04:00
52c9e84285 Fix DeBERTa integration tests (#7729) 2020-10-16 02:49:13 -04:00
2255c2c7a0 [seq2seq] get_git_info fails gracefully (#7843)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-10-16 00:22:43 -04:00
dfa4c26bc0 Typo and fix the input of labels to cross_entropy (#7841)
The current version caused some errors. The changes fixed it for me. Hope this is helpful!
2020-10-15 19:36:31 -04:00
a5a8eeb772 fix DeprecationWarning (#7834)
in `tests/test_utils_check_copies.py` I was getting intermittently:
```
utils/check_copies.py:52
  /mnt/nvme1/code/transformers-comet/utils/check_copies.py:52: DeprecationWarning: invalid escape sequence \s
    while line_index < len(lines) and re.search(f"^{indent}(class|def)\s+{name}", lines[line_index]) is None:
```
So this should fix it.
2020-10-15 16:21:09 -04:00
9c71cca316 model card for bert-base-NER (#7799)
* model card for bert-base-NER

* add meta data up top

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-15 21:55:00 +02:00
4dbca50022 fix wandb/comet problems (#7830)
* fix wandb/comet problems

* simplify

* Update src/transformers/integrations.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-15 15:23:24 -04:00
e7aa64838c [model_cards] facebook/bart-large-mnli: register ZSC for the inference API
cc @Narsil @mfuntowicz @joeddav
2020-10-15 19:02:10 +02:00
2ce3ddab2d Small fixes to NotebookProgressCallback (#7813) 2020-10-15 10:30:34 -04:00
6f45dd2fac [model_cards] Fix yaml for Facebook/wmt19-*
see d99ed7ad618037ae878f0758157ed0764bd7f935
2020-10-15 16:14:08 +02:00
d99ed7ad61 [model_cards] Facebook: add thumbnail 2020-10-15 12:53:29 +02:00
2485b8b0ac Set XLA example time to 500s 2020-10-15 12:34:29 +02:00
2dba7d5702 Notebook catch all errors 2020-10-15 12:21:32 +02:00
9ade8e7499 Upgrading TFAutoModelWithLMHead to (#7730)
- TFAutoModelForCausalLM
- TFAutoModelForMaskedLM
- TFAutoModelForSeq2SeqLM

as per deprecation warning. No tests as it simply removes current
warnings from tests.
2020-10-15 05:26:08 -04:00
62b5622e6b Add specific notebook ProgressCalback (#7793) 2020-10-15 05:05:08 -04:00
0911b6bd86 Improving Pipelines by defaulting to framework='tf' when pytorch seems unavailable. (#7728)
* Improving Pipelines by defaulting to framework='tf' when

pytorch seems unavailable.

* Actually changing the default resolution order to account for model
defaults

Adding a new tests for each pipeline to check that pipeline(task) works
too without manually adding the framework too.
2020-10-15 09:42:07 +02:00
3a134f7c67 Fix TF savedmodel in Roberta (#7795)
* Remove wrong parameter.

* Same in Longformer
2020-10-14 23:48:50 +02:00
3032de9369 Model Card (#7752)
* Create README.md

* Update model_cards/sentence-transformers/LaBSE/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-14 13:30:58 -04:00
3fdbeba83c [model_cards] sarahlintang/IndoBERT (#7748)
* Create README.md

* Update model_cards/sarahlintang/IndoBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-14 13:10:31 -04:00
ba654270b3 [model_cards] rename to correct model name 2020-10-14 19:02:48 +02:00
08978487e7 Create README.md (#7722) 2020-10-14 12:56:12 -04:00
3557509127 added evaluation results for classification task (#7790) 2020-10-14 12:50:43 -04:00
bb9559a7f9 Don't use store_xxx on optional bools (#7786)
* Don't use `store_xxx` on optional bools

* Refine test

* Refine test
2020-10-14 12:05:02 -04:00
a1d1b332d0 Add predict step accumulation (#7767)
* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-14 11:41:45 -04:00
8feb0cc967 fix examples/rag imports, tests (#7712) 2020-10-14 11:35:00 -04:00
890e790e16 [model_cards] TinyBERT (HUAWEI Noah's Ark Lab) (#7775) 2020-10-14 09:31:01 -04:00
121dd4332b Add batch inferencing support for GPT2LMHeadModel (#7552)
* Add support for gpt2 batch inferencing

* add test

* remove typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-10-14 13:40:24 +02:00
0c64b18840 Fix bert position ids in DPR convert script (#7776)
* fix bert position ids in DPR convert script

* style
2020-10-14 05:30:02 -04:00
7968051aba Fix typo 2020-10-13 17:30:46 -04:00
2977bd528f Faster pegasus tokenization test with reduced data size (#7762) 2020-10-13 16:22:29 -04:00
2d6e2ad4fa Adding optional trial argument to model_init (#7759)
* Adding optional trial argument to model_init

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-13 17:07:02 +02:00
7e73c12805 fixed lots of typos. (#7758) 2020-10-13 10:00:20 -04:00
8cb4ecca25 Avoid unnecessary DDP synchronization when gradient_accumulation_steps > 1 (#7742)
* use DDP no_sync when possible

* fix is_nlp_available addition mistake

* reformat trainer.py

* reformat trainer.py

* drop support for pytorch < 1.2

* return support for pytorch < 1.2
2020-10-13 09:46:44 -04:00
52f7d74398 Do not softmax when num_labels==1 (#7726)
* Do not softmax when num_labels==1

* Update src/transformers/pipelines.py

Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>

Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
2020-10-13 09:42:27 -04:00
82b09a8481 [Rag] Fix loading of pretrained Rag Tokenizer (#7756)
* fix rag

* Update tokenizer save_pretrained

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-10-13 14:34:22 +02:00
2d4e928d97 Update PULL_REQUEST_TEMPLATE.md
Putting my name on a couple more issues to directly redirect them to me
2020-10-13 12:18:31 +02:00
dcba9ee03b Gpt1 for sequence classification (#7683)
* Add Documentation for GPT-1 Classification

* Add GPT-1 with Classification head

* Add tests for GPT-1 Classification

* Add GPT-1 For Classification to auto models

* Remove authorized missing keys, change checkpoint to openai-gpt
2020-10-13 05:06:15 -04:00
f34b4cd1bd ElectraTokenizerFast (#7754) 2020-10-13 04:50:41 -04:00
9c2b2db2cd [marian] Automate Tatoeba-Challenge conversion (#7709) 2020-10-12 12:24:25 -04:00
aacac8f708 Add license info to nlptown/bert-base-multilingual-uncased-sentiment (#7738) 2020-10-12 11:56:10 -04:00
1f1d950b28 Fix #7331 (#7732) 2020-10-12 09:10:52 -04:00
d9ffb87efb Fix tf text class (#7724)
* Fix test

* fix generic text classification

* fix test

* Fix tests
2020-10-12 08:45:15 -04:00
d6175a4268 Fix code quality 2020-10-12 08:22:27 -04:00
1d5ea34f6a Fix trainer callback (#7720)
Fix a bug that happends when subclassing Trainer and
overwriting evaluate() without calling prediciton_loop()
2020-10-12 07:45:12 -04:00
f176e70723 The input training data files (multiple files in glob format). (#7717)
Very often splitting large files to smaller files can prevent tokenizer going out of memory in environment like Colab that does not have swap memory
2020-10-12 07:44:02 -04:00
34fcfb44e3 Update tokenization_utils_base.py (#7696)
Minor spelling corrections in docstrings. "information" is uncountable in English and has no plural.
2020-10-12 06:09:20 -04:00
2f34bcf3e7 check for tpu availability in save_pretrained (#7699)
Added is_torch_tpu_available() to the condition
for saving a model as xla model. "xla_device"
property of config can also be True on a non-xla
device, when loading a checkpointthat was trained
on xla before.

Resolves #7695
2020-10-12 04:10:17 -04:00
13c1857718 Fix typo in all model docs (#7714) 2020-10-12 04:06:59 -04:00
83086858f8 fixed typo in warning line 207. (#7718)
replace 'men_len' with 'mem_len' to match parameter name
2020-10-12 03:58:58 -04:00
03ec02a667 Corrected typo: maked → masked (#7703) 2020-10-11 16:45:00 -04:00
827c519494 [examples] bump pl=0.9.0 (#7053) 2020-10-11 16:39:38 -04:00
ba4bbd92bc Fix docstring in AutoModel class (#7694) 2020-10-10 21:08:08 -04:00
26d5475d4b Added license information for default and distilbert models (#7688) 2020-10-10 03:55:11 -04:00
c6e18de9f8 Fix flaky test in test_trainer (#7689) 2020-10-09 20:01:15 -04:00
2c9e83f7b8 Fix title level in Blenderbot doc (#7687) 2020-10-09 19:24:10 -04:00
9618cd6964 Import integration libraries first (#7650)
* Import intergration libraries first

* isort and black happiness

* flake8 happiness

* Add a test

* Black reformat

* Ignore import order in tests

* A heavy-handed method of disabling comet for tests

* Remove comet_ml tests

* Run black on setup.py
2020-10-09 12:13:22 -04:00
4dcc424de3 Complete release instruction 2020-10-09 12:12:03 -04:00
a3cea6a8cc Better links for models in READMED and doc index (#7680) 2020-10-09 11:17:16 -04:00
0af53b1ef9 Delete extra test file (#7681) 2020-10-09 11:16:35 -04:00
b0f05e0c4c [pegasus] Faster tokenizer tests (#7672) 2020-10-09 11:10:32 -04:00
bc00b37a0d Revert "Better model links in the README and index"
This reverts commit 76e05518bb11e29c8532d6ac529e72ce9a105495.
2020-10-09 10:56:13 -04:00
76e05518bb Better model links in the README and index 2020-10-09 10:54:40 -04:00
9ad830596d Fix dataset cardinality (#7678)
* Fix test

* Fix cardinality issue

* Fix test
2020-10-09 10:38:25 -04:00
a1ac082879 add license to xlm-roberta-large-xnli card 2020-10-09 09:16:06 -04:00
21ed3a6b99 Reintroduce clean_text on BertTokenizer call which was removed by mistake in #4723 (#5749)
* Reintroduce clean_text call which was removed by mistake in #4723

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added unittest for clean_text parameter on Bert tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Better unittest name.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Adapt unittest to use untrained tokenizer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Code quality + update test

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-10-09 08:07:28 -04:00
5668fdb09e Update XLM-RoBERTa details (#7669) 2020-10-09 05:16:58 -04:00
0578a91300 fix nn.DataParallel compatibility with PyTorch 1.5 (#7671)
The same type of errors as in https://github.com/huggingface/transformers/pull/4300
2020-10-09 05:15:08 -04:00
297233fa92 [s2s] Switch README urls to cdn (#7670) 2020-10-08 21:22:22 -04:00
a1ecc90d6b [pseudo] Switch URLS to CDN (#7661) 2020-10-08 14:12:39 -04:00
06a973fd2a [s2s] configure lr_scheduler from command line (#7641) 2020-10-08 13:06:35 -04:00
4a00613c24 Fix RobertaForCausalLM docs (#7642)
* Fix RobertaForCausalLM docs

* Apply review suggestion

Co-authored-by: sgugger <sylvain.gugger@gmail,com>

Co-authored-by: sgugger <sylvain.gugger@gmail,com>
2020-10-08 08:36:00 -04:00
55cb2ee62e Green tests: update torch-hub test dependencies (add protobuf and pin tokenizer 0.9.0-RC2) (#7658)
* pin torch-hub test

* add protobuf dep
2020-10-08 13:21:15 +02:00
9aeacb58ba Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)
* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-10-08 11:32:16 +02:00
4d04120c6d Replaced torch.load for loading the pretrained vocab of TransformerXL tokenizer to pickle.load (#6935)
* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load

* Replaced torch.save with pickle.dump when saving the vocabulary

* updating transformer-xl

* uploaded on S3 - compatibility

* fix tests

* style

* Address review comments

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-10-08 10:16:10 +02:00
aba4e22944 [pseudolabels] cleanup markdown table (#7653) 2020-10-07 23:04:18 -04:00
e3e6517355 Fix 3 failing slow bart/blender tests (#7652) 2020-10-07 22:05:03 -04:00
960faaaf28 Blenderbot (#7418)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-07 19:09:23 -04:00
aee7967fc4 Added model cards for Tagalog BERT models (#7603) 2020-10-07 16:49:20 -04:00
b1c06140f4 Create README.md for IsRoBERTa language model (#7640)
* Create README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-07 16:46:03 -04:00
e10d389561 [Model card] SinhalaBERTo model. (#7558)
* [Model card] SinhalaBERTo model.

This is the model card for keshan/SinhalaBERTo model.

* Update model_cards/keshan/SinhalaBERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-07 16:40:52 -04:00
167bce56f2 [model_card] bert-base-5lang-cased (#7573)
Co-authored-by: Amin <amin.geotrend@gmail.com>
2020-10-07 16:38:14 -04:00
923dd4e5ef Create README.md (#7581) 2020-10-07 16:37:40 -04:00
85ead0fec4 Update README.md (#7590) 2020-10-07 16:37:10 -04:00
c6b9c72eac Update README.md (#7629)
Minor changes: Add arxiv link + Layout improvement + fix typos
2020-10-07 16:36:08 -04:00
048b4bd2c6 Create Model Card For "abhilash1910/french-roberta" Model (#7544) 2020-10-07 16:35:28 -04:00
c2e0d8ac52 [model_card] nikokons/gpt2-greek
by @nikkon3
2020-10-07 16:28:47 -04:00
e2bb9abb6a [s2s] release pseudolabel links and instructions (#7639) 2020-10-07 11:20:44 -04:00
08ba4b4902 Trainer callbacks (#7596)
* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-07 10:50:21 -04:00
8fa0c956b3 Add GPT2 to sequence classification auto model (#7630) 2020-10-07 05:20:05 -04:00
e084089eb9 Fix tokenizer UnboundLocalError when padding is set to PaddingStrategy.MAX_LENGTH (#7610)
* Fix UnboundLocalError when PaddingStrategy is MAX_LENGTH

* Fix UnboundLocalError for TruncationStrategy
2020-10-06 18:16:00 -04:00
adfe6ace88 Fix wrong reference name/filename in docstring (#7616)
Resolves: #7613
2020-10-06 18:02:29 -04:00
f0d20ad328 Fix-copies 2020-10-06 23:44:03 +02:00
5982431814 Add GPT2ForSequenceClassification based on DialogRPT (#7501)
* Add GPT2ForSequenceClassification based on DialogRPT

* Better documentation

* Code quality
2020-10-06 17:31:21 -04:00
500be01c5d [s2s] save first batch to json for debugging purposes (#6810) 2020-10-06 16:11:56 -04:00
2b574e7c60 [bart] fix config.classif_dropout (#7593) 2020-10-06 11:33:51 -04:00
aa6c3c14b4 typo fix (#7611)
It should be T5-3B not T5-3M.
2020-10-06 15:32:52 +02:00
98fb718577 Docker GPU Images: Add NVIDIA/apex to the cuda images with pytorch (#7598)
- Use cuda:10.2 image instead of 10.1 (to address version mismatch
  warning with pytorch)
- Use devel version that is built on the runtime and includes headers
  and development tools (was otherwise failing to build apex)
2020-10-06 15:23:32 +02:00
4d541f516f fix return dicitonary labels from masked_lm_labels to labels (#7595) 2020-10-06 09:12:04 -04:00
8d2c248df7 Update README.md (#7612) 2020-10-06 08:46:55 -04:00
1c80b2c604 Create README.md (LEGAL-BERT Model card) (#7607)
* Create README.md

Model description for all LEGAL-BERT models, published as part of  "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020

* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-06 08:46:17 -04:00
eda27f4494 [TF generation] Fix typo (#7582)
* Fixing top_k and min_length assertions, and a typo fix

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-06 12:47:16 +02:00
0257992e4a Fix squeezebert docs (#7587)
* Configuration

* Modeling

* Tokenization

* Obliterate the trailing spaces

* From underlines to long underlines
2020-10-06 06:22:04 -04:00
66c72082d0 Add ProtT5-XL-BFD model card (#7606)
* Add ProtT5-XL-BFD model card

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-10-06 12:19:21 +02:00
b21a30bdd8 [makefile] check only .py files (#7588)
* check only .py files

* better choice of words
2020-10-06 05:25:21 -04:00
d5d2744aa7 Support T5 Distillation w/hidden state supervision (#7599) 2020-10-05 21:31:48 -04:00
818c294fdd The toggle actually sticks (#7586) 2020-10-05 11:23:57 -04:00
03835af700 Documentation fixes (#7585) 2020-10-05 11:01:03 -04:00
9cf7b23b9b Custom TF weights loading (#7422)
* First try

* Fix TF utils

* Handle authorized unexpected keys when loading weights

* Add several more authorized unexpected keys

* Apply style

* Fix test

* Address Patrick's comments.

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply style

* Make return_dict the default behavior and display a warning message

* Revert

* Replace wrong keyword

* Revert code

* Add forgot key

* Fix bug in loading PT models from a TF one.

* Fix sort

* Add a test for custom load weights in BERT

* Apply style

* Remove unused import

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-05 09:58:45 -04:00
d3adb985d1 Expand test to locate flakiness (#7580) 2020-10-05 09:45:47 -04:00
b2b7fc7814 Check and update model list in index.rst automatically (#7527)
* Check and update model list in index.rst automatically

* Check and update model list in index.rst automatically

* Adapt template
2020-10-05 09:40:45 -04:00
ca05c2a47d Fix post_init of some TrainingArguments (#7525) 2020-10-05 09:19:16 -04:00
3bd3d8b549 Add new dummy PT objects 2020-10-05 09:13:47 -04:00
28d183c90c Allow soft dependencies in the namespace with ImportErrors at use (#7537)
* PoC on RAG

* Format class name/obj name

* Better name in message

* PoC on one TF model

* Add PyTorch and TF dummy objects + script

* Treat scikit-learn

* Bad copy pastes

* Typo
2020-10-05 09:12:04 -04:00
1a00f46c74 Update Code example according to deprecation of AutoModeWithLMHead (#7555)
'The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.'
I dont know how to change the 'How to use this model directly from the 🤗/transformers library:' part since it is not part of the model-paper
2020-10-05 08:21:21 -04:00
0d79de7322 docs(pretrained_models): fix num parameters (#7575)
* docs(pretrained_models): fix num parameters

* fix(pretrained_models): correct typo

Co-authored-by: Amin <amin.geotrend@gmail.com>
2020-10-05 07:50:56 -04:00
ba5ea66e30 Fix tokenization in SQuAD for RoBERTa, Longformer, BART (#7387)
* fix squad tokenization for roberta & co

* change to pure type based check

* sort imports
2020-10-05 06:34:13 -04:00
0270256b27 Allow nested tensors in predicted logits (#7542) 2020-10-05 06:33:15 -04:00
60de910e60 Add power argument for TF PolynomialDecay (#5732)
* 🚩 Add `power` argument for TF PolynomialDecay

* 🚩 Create default optimizer with power

* 🚩 Add argument to training args

* 🚨 Clean code format

* 🚨 Fix black warning

* 🚨 Fix code format
2020-10-05 05:16:29 -04:00
41c3a3b98e Add Electra unexpected keys (#7569) 2020-10-05 04:49:39 -04:00
071970feb8 [Model card] Java Code Summarizer model (#7568)
* Create README.md

* Update model_cards/ncoop57/bart-base-code-summarizer-java-v0/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-05 04:49:17 -04:00
02ef825be2 SqueezeBERT architecture (#7083)
* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports
2020-10-05 04:25:43 -04:00
e2c935f561 Cleanup documentation for BART, Marian, MBART and Pegasus (#7523)
* Cleanup documentation for BART, Marian, MBART and Pegasus

* Cleanup documentation for BART, Marian, MBART and Pegasus
2020-10-05 04:22:12 -04:00
5e941bece2 LayoutLM: add exception handling for bbox values (#7452)
* LayoutLM: add exception handling for bbox values

To replicate unhandled error:

- In `test_modelling_layoutlm.py` set `range_bbox=1025`, i.e. greater 1024
- Run `pytest tests/test_modeling_layoutlm.py`

Requirement for bbox values to be within the range 0-1000 is documented
but if it is violated then it isa not clear what is the issue from error
message.

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-10-05 04:17:14 -04:00
2ca0fae9a6 added script for fine-tuning roberta for sentiment analysis task (#7505) 2020-10-05 03:57:15 -04:00
95f792afb0 Remove labels from the RagModel example (#7560) 2020-10-04 17:39:23 -04:00
99cb924bfb [s2s] add config params like Dropout in Seq2SeqTrainingArguments (#7532) 2020-10-04 12:42:30 -04:00
9bdce3a4f9 [s2s] fix lockfile and peg distillation constants (#7545) 2020-10-02 15:58:14 -04:00
de4d7b004a [s2s] Adafactor support for builtin trainer (#7522) 2020-10-01 17:27:45 -04:00
d3a9601a11 [s2s] trainer scripts: Remove --run_name, thanks sylvain! (#7521) 2020-10-01 17:18:47 -04:00
bdcc4b78a2 Fix seq2seq example test (#7518)
* Fix seq2seq example test

* Fix bad copy-paste

* Also save the state
2020-10-01 14:13:29 -04:00
29baa8fabe Clean the Trainer state (#7490)
* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Trainer should not modify its TrainingArguments

* Add test of resumed training

* Fixes

* Non multiGPU test

* Clean Trainer state

* Add more to the state

* Documentation

* One last test

* Make resume training test more complete

* Unwanted changes
2020-10-01 13:07:04 -04:00
2a358f45ef [s2s] fix nltk pytest race condition with FileLock (#7515) 2020-10-01 12:51:09 -04:00
72d363d979 [examples/s2s] clean up finetune_trainer (#7509) 2020-10-01 12:19:29 -04:00
bd2621583b fix data type (#7513) 2020-10-01 18:15:41 +02:00
62f5ae68ec [Seq2Seq] Fix a couple of bugs and clean examples (#7474)
* clean T5

* fix t5 tests

* fix index typo

* fix tf common test

* fix examples

* change positional ordering for Bart and FSTM

* add signature test

* clean docs and add tests

* add docs to encoder decoder

* clean docs

* correct two doc strings

* remove sig test for TF Elektra & Funnel

* fix tf t5 slow tests

* fix input_ids to inputs in tf

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_bart.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* implement lysandre results

* make style

* fix encoder decoder typo

* fix tf slow tests

* fix slow tests

* renaming

* remove unused input

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-01 17:38:50 +02:00
a42f62d34f Train T5 in Tensoflow 2 Community Notebook (#7428)
* t5 t5 community notebook added

* author link updated

* t5 t5 community notebook added

* author link updated

* new colab link updated

Co-authored-by: harris <muhammad.harris@visionx.io>
2020-10-01 16:54:29 +02:00
5fc3b5cba4 Fix Tune progress_reporter kwarg (#7508) 2020-10-01 10:34:31 -04:00
dabc85d1ba Report Tune metrics in final evaluation (#7507) 2020-10-01 09:52:36 -04:00
9a92afb6d0 Update LayoutLM doc (#7388)
Co-authored-by: Alexandr Maslov <avmaslov3@gmail.com>
2020-10-01 09:11:42 -04:00
e32390931d [model_card] distilbert-base-german-cased 2020-10-01 09:08:49 -04:00
9a4e163b58 [model_card] Fix metadata, adalbertojunior/PTT5-SMALL-SUM 2020-10-01 08:54:06 -04:00
8435e10e24 Create README.md (#7299)
* Create README.md

* language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-01 08:52:28 -04:00
d727432072 Update README.md (#7459) 2020-10-01 08:51:26 -04:00
664da5b077 Create README.md (#7468) 2020-10-01 08:50:26 -04:00
f745f61c99 Update README.md (#7491)
Model now fine-tuned on Transformers 3.1.0, previous out-of-date model was fine-tuned on Transformers 2.3.0.
2020-10-01 08:50:07 -04:00
6ef7658c0a Create README.md (#7349)
Model card for akhooli/personachat-arabic
2020-10-01 08:48:51 -04:00
15ab3f049b Creating readme for bert-base-mongolian-cased (#7439)
* Creating readme for bert-base-mongolian-cased

* Update model_cards/bayartsogt/bert-base-mongolian-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-10-01 08:46:27 -04:00
0c2b9fa831 creating readme for bert-base-mongolian-uncased (#7440) 2020-10-01 08:45:22 -04:00
381443c096 Update README.md (#7498)
Making transformers readme more robust.
2020-10-01 07:42:07 -04:00
85d2d8c920 Fix local_files_only for TF (#6091) 2020-10-01 05:06:02 -04:00
9e80f972fb Enable pegasus fp16 by clamping large activations (#7243)
* Clean clamp

* boom boom

* Take some other changes

* boom boom

* boom boom

* boom boom

* one chg

* fix test

* Use finfo

* style
2020-10-01 04:48:37 -04:00
be51c1039d Add forgotten return_dict argument in the docs (#7483) 2020-10-01 04:41:29 -04:00
48f23f92a8 [s2sTrainer] test + code cleanup (#7467) 2020-10-01 00:33:01 -04:00
097049b81b Distributed Trainer: 2 little fixes (#7461)
* reset model.config

* Update src/transformers/trainer.py

* use lower case tensor

* Just tensor change
2020-09-30 22:14:14 -04:00
0acd1ffa09 [doc] rm Azure buttons as not implemented yet 2020-09-30 17:31:08 -04:00
03e46c1de3 [s2s] fix kwargs style (#7488) 2020-09-30 17:00:06 -04:00
6fe8a693eb [s2s] Fix t5 warning for distributed eval (#7487) 2020-09-30 16:58:03 -04:00
4c6728460a Bump isort version. (#7484) 2020-09-30 13:44:58 -04:00
c031d01023 Seq2SeqDataset: avoid passing src_lang everywhere (#7470)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-30 13:27:48 -04:00
08939cfdf7 [s2strainer] fix eval dataset loading (#7477) 2020-09-30 12:39:13 -04:00
a97a73e0ee Small QOL improvements to TrainingArguments (#7475)
* Small QOL improvements to TrainingArguments

* With the self.
2020-09-30 12:12:03 -04:00
dc7d2daa4c Alphabetize model lists (#7478) 2020-09-30 10:43:58 -04:00
fdccf82e28 Remove config assumption in Trainer (#7464)
* Remove config assumption in Trainer

* Initialize for eval
2020-09-30 09:03:25 -04:00
cc4eff8087 Make transformers install check positive (#7473)
When transformers is correctly installed, you should get a positive message ^_^
2020-09-30 07:44:40 -04:00
7a0cf0ec93 Add DeBERTa model (#5929)
* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-30 07:07:30 -04:00
44a93c981f Number of GPUs for multi-gpu (#7472) 2020-09-30 06:53:20 -04:00
886ef35ce6 Fix LXMERT with DataParallel (#7471) 2020-09-30 06:41:24 -04:00
35e94c68df Number of GPUs 2020-09-30 12:29:26 +02:00
056723ad1d Multi-GPU setup (#7453) 2020-09-30 05:53:34 -04:00
4ba248748f Get a better error when check_copies fails (#7457)
* Get a better error when check_copies fails

* Fix tests
2020-09-30 10:05:14 +02:00
bef0175168 remove codecov PR comments (#7400) 2020-09-29 15:16:43 -04:00
a1c2ef7bd0 Add documentation for v3.3.1 2020-09-29 14:31:43 -04:00
1ba08dc221 Release: v3.3.1 2020-09-29 14:17:34 -04:00
8546dc55c2 Fix Trainer tests in a multiGPU env (#7458) 2020-09-29 14:06:41 -04:00
d0fd7154c5 Catch import datasets common errors (#7456) 2020-09-29 13:42:09 -04:00
f1220c5fe2 Add a code of conduct (#7433) 2020-09-29 13:38:47 -04:00
9e9a1fb8c7 Adding gradient checkpointing to GPT2 (#7446)
* GPT2 gradient checkpointing

* find_unused_parameters removed if checkpointing

* find_unused_parameters removed if checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Added a test for generation with checkpointing

* Update src/transformers/configuration_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-29 12:26:26 -04:00
52e8392b7e Add automatic best model loading to Trainer (#7431)
* Add automatic best model loading to Trainer

* Some small fixes

* Formatting
2020-09-29 10:41:18 -04:00
1fc4de69ed Document new features of make fixup (#7434) 2020-09-29 03:56:57 -04:00
205bf0b7ea Update README.md (#7444)
Hi, just corrected the example code, add 2 links and fixed some typos
2020-09-29 03:18:01 -04:00
74d8d69bd4 [s2s] consistent output format across eval scripts (#7435) 2020-09-28 23:20:03 -04:00
671b278e25 Create README.md (#7436)
* Create README.md

MagBERT-NER : Added widget (Text)

* Rename model_cards/README.md to model_cards/TypicaAI/magbert-ner/README.md
2020-09-28 18:25:25 -04:00
a1a8ffa512 Update README.md (#7429)
Add links to models fine-tuned on a downstream task
2020-09-28 13:40:09 -04:00
f62f2ffdcc [makefile] 10x speed up checking/fixing (#7403)
* [makefile] check/fix only modified since branching files

* fix phonies

* parametrize dirs

* have only one source for dirs to check

* look ma, no autoformatters here
2020-09-28 10:45:42 -04:00
16c213820e Update docs to version v3.3.0 2020-09-28 16:32:00 +02:00
0613f05226 Release: v3.3.0 2020-09-28 16:24:43 +02:00
ca3fc36de3 Reorganize documentation navbar (#7423)
* Reorganize documentation navbar

* Update css to have clear sections
2020-09-28 16:22:58 +02:00
7f4115c099 Pull request template (#7392)
co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-09-28 09:51:49 -04:00
0611eab5e3 Document RAG again (#7377)
Do not merge before Monday
2020-09-28 08:31:46 -04:00
7563d5a3cf Catch PyTorch warning when saving/loading scheduler (#7401) 2020-09-28 08:20:10 -04:00
1749ca317e docs: fix model sharing file names (#5855)
* docs: fix model sharing file names

* Update docs/source/model_sharing.rst

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* docs(model_sharing.rst): fix new line

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-28 08:17:30 -04:00
8279471506 correct RAG model cards (#7420) 2020-09-28 11:08:39 +02:00
4083a55ab0 Flos fix (#7384) 2020-09-28 04:09:26 -04:00
ae3e84f3ba [RAG] Clean Rag readme in examples (#7413)
* Improve README + consolidation script

* Reformat README

* Reformat README

Co-authored-by: Your Name <you@example.com>
2020-09-28 10:06:39 +02:00
748425d47d [T5] allow config.decoder_layers to control decoder size (#7409)
* Working assymmetrical T5

* rename decoder_layers -> num_decoder_layers

* Fix docstring

* Allow creation of asymmetric t5 students
2020-09-28 03:08:04 -04:00
7296fea1d6 [s2s] rougeLSum expects \n between sentences (#7410)
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2020-09-27 16:27:19 -04:00
eab5f59682 [s2s] add create student script (#7290)
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-27 15:10:46 -04:00
e50a931c11 [Longformer, Bert, Roberta, ...] Fix multi gpu training (#7272)
* fix multi-gpu

* fix longformer

* force to delete unnecessary layers

* fix notifications

* fix warning

* fix roberta

* fix tests

* remove hasattr

* fix tests

* fix roberta

* merge and clean authorized keys
2020-09-25 20:33:21 +02:00
2c8ecdf8a8 fix rag retriever save pretrained (#7399) 2020-09-25 19:47:12 +02:00
1a14687e6f Update README.md 2020-09-25 19:43:48 +02:00
3327c2b0f6 Update README.md 2020-09-25 19:43:36 +02:00
fe326bd5cf Remove dependency on examples/seq2seq from rag (#7395)
Co-authored-by: Your Name <you@example.com>
2020-09-25 18:20:49 +02:00
ad39271ae8 Fix FP16 and attention masks in FunnelTransformer (#7374)
* Fix #7371

* Fix training

* Fix test values

* Apply the fix to TF as well
2020-09-25 12:20:39 -04:00
4e5b036bdd Update README.md 2020-09-25 18:16:46 +02:00
55eccfbb49 Update README.md 2020-09-25 18:16:44 +02:00
e2e77f02c2 Fix BartModel output documentation (#7390) 2020-09-25 11:48:13 -04:00
bbb07830ff Speedup check_copies script (#7394) 2020-09-25 11:47:22 -04:00
8859c4f841 [code quality] new make target that combines style and quality targets (#7310)
* [code quality] merge style and quality targets

Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.

This PR suggests to merge the 2 targets into one efficient target.

I will edit the docs if this change resonates with the team.

* move checks into style, re-use target

* better name

* add fixup target

* document new target
2020-09-25 11:37:40 -04:00
38a1b03f4d Remove unhelpful bart warning (#7391) 2020-09-25 11:01:07 -04:00
5ff0d6d7d0 Update README.md 2020-09-25 16:58:29 +02:00
cf1c88e092 [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372)
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-25 16:12:46 +02:00
571c7a11c1 [Rag] Fix wrong usage of num_beams and bos_token_id in Rag Sequence generation (#7386)
* fix_rag_sequence

* add second bug fix
2020-09-25 14:35:49 +02:00
415071b4c2 doc changes (#7385) 2020-09-25 08:00:36 -04:00
2dd652d757 [RAG] Add missing doc and attention_mask to rag (#7382)
* add docs

* add missing docs and attention_mask in fine-tune
2020-09-25 11:23:55 +02:00
7cdd9da5bf Check config type using type instead of isinstance (#7363)
* Check config type instead of instance


Bad merge

* Remove for loops

* Style
2020-09-25 05:09:09 -04:00
3c6bf8998f modeling_bart: 3 small cleanups that dont change outputs (#7381)
* Mbart passing

* boom boom

* cleaner assert

* add assert

* Fix tests
2020-09-25 04:24:14 -04:00
9e68d075a4 Seq2SeqTrainer (#6769)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 18:46:58 -04:00
d9d0f1140b [s2s] distributed eval allows num_return_sequences > 1 (#7254) 2020-09-24 17:30:09 -04:00
0804d077c6 correct attention mask (#7373) 2020-09-24 23:22:04 +02:00
a8cbc4269c [fsmt] build/test scripts (#7257)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-24 17:10:26 -04:00
a8e7982f84 Remove mentions of RAG from the docs (#7376)
* Remove mentions of  RAG from the docs

* Deactivate check
2020-09-24 17:07:14 -04:00
eadd870b2f [seq2seq] make it easier to run the scripts (#7274) 2020-09-24 15:23:48 -04:00
8d3bb781ee Formatter (#7368)
* Formatter

* Docs
2020-09-24 10:59:21 -04:00
7dfdf793bb Fixing case in which Trainer hung while saving model in distributed training (#7365)
* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts
2020-09-24 09:56:40 -04:00
0ccb6f5c6d Clean RAG docs and template docs (#7348)
* Clean RAG docs and template docs

* Fix typo

* Better doc
2020-09-24 09:24:41 -04:00
27174bd4fe Make PyTorch model files independent from each other (#7352) 2020-09-24 08:53:54 -04:00
d161ed1682 Update the TF models to remove their interdependencies (#7238)
* Refacto the models to remove their interdependencies

* Fix Flaubert model

* Fix Flaubert

* Fix XLM

* Fix Albert

* Fix Roberta

* Fix Albert

* Fix Flaubert

* Apply style + remove unused imports

* Fix Distilbert

* remove unused import

* fix Distilbert

* Fix Flaubert

* Apply style

* Fix Flaubert

* Add the copy comments for the check_copies script

* Fix MobileBert model name

* Address Morgan's comments

* Fix typo

* Oops typo
2020-09-24 08:30:59 -04:00
0cffa424f8 Updata tokenization_auto.py (#6870)
Updata tokenization_auto.py to handle Inherited tokenizer
2020-09-24 06:52:10 -04:00
03fb8e79c6 Update modeling_tf_longformer.py (#7359)
correct a very small mistake
2020-09-24 11:37:29 +02:00
1ff5bd38a3 Check decorator order (#7326)
* Check decorator order

* Adapt for parametrized decorators

* Fix typos
2020-09-24 04:54:37 -04:00
0be5f4a00c Expand a bit the documentation doc (#7350) 2020-09-24 04:34:18 -04:00
38f1703795 wip: Code to add lang tags to marian model cards (#6586) 2020-09-23 18:11:06 -04:00
129fdae040 Remove reference to args in XLA check (#7344)
Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.
2020-09-23 13:56:21 -04:00
d266613635 [Benchmarks] Change all args to from no_... to their positive form (#7075)
* Changed name to all no_... arguments and all references to them, inverting the boolean condition

* Change benchmark tests to use new Benchmark Args

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/benchmark/benchmark.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix Style. Add --no options in help

* fix some part of tests

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* fix all tests

* make style

* add backwards compability

* make backwards compatible

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
2020-09-23 13:25:24 -04:00
8c697d58ef Ensure that integrations are imported before transformers or ml libs (#7330)
* Ensure that intergrations are imported before transformers or ml libs

* Black reformatter wanted a newline

* isort requests

* black requests

* flake8 requests
2020-09-23 13:23:45 -04:00
3323146e90 Models doc (#7345)
* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00
58405a527b Fixed evaluation_strategy on epoch end bug (#7340)
* Fixed evaluation_strategy on epoch end bug

move the evaluation script outside the the iteration loop

* black formatting
2020-09-23 13:17:00 -04:00
28cf873036 [testing] skip decorators: docs, tests, bugs (#7334)
* skip decorators: docs, tests, bugs

* another important note

* style

* bloody style

* add @pytest.mark.parametrize

* add note

* no idea what it wants :(
2020-09-23 05:16:19 -04:00
df53643807 [code quality] fix confused flake8 (#7309)
* fix confused flake

We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:

```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.

The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.

Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.

* adjust the other files that will now rely on black's config file
2020-09-22 22:12:36 -04:00
78387cc63e [s2s] only save metrics.json from rank zero (#7331) 2020-09-22 18:27:28 -04:00
e53138a1b9 [s2s] add src_lang kwarg for distributed eval (#7300) 2020-09-22 18:26:37 -04:00
a9c7849cfa [model_cards] blinoff/roberta-base-russian-v0 (#7317) 2020-09-22 18:26:13 -04:00
f5518e5631 Formatting 2020-09-22 14:55:12 -04:00
17099ebd58 Add num workers cli arg (#7322)
* Add dataloader_num_workers to TrainingArguments

This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.

* Pass num_workers argument on DataLoader init
2020-09-22 14:44:42 -04:00
25b0463d0b [s2s] add supported architecures to MD (#7252) 2020-09-22 13:09:35 -04:00
d6bc72c469 Fixed results of SQuAD-FR evaluation (#7313)
The score for the F1 metric was reported as the Exact Match and vice-versa.
2020-09-22 12:39:07 -04:00
6303b5a718 [Bug Fix] The actual batch_size is inconsistent with the settings. (#7235)
* [bug fix] fixed the bug that the actual batch_size is inconsistent with the parameter settings

* reformat

* reformat

* reformat

* add support for dict and BatchEncoding

* add support for dict and BatchEncoding

* add documentation for DataCollatorForNextSentencePrediction

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename variables

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-22 12:31:21 -04:00
c754c41c61 RAG (#6813)
* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867

* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name

Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
2020-09-22 18:29:58 +02:00
1ee2194fb6 Mark big downloads slow (#7325)
* Make big downloads as slow

* Add import

* Right order for slow decorator

* More slow tests
2020-09-22 12:21:52 -04:00
585217c87f Add generic text classification example in TF (#5716)
* Add new example with nlp

* Update README

* replace nlp by datasets

* Update examples/text-classification/README.md

Add Lysandre's suggestion.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 12:05:05 -04:00
6e21f24220 Documentation version 2020-09-22 18:04:39 +02:00
3ebb1b3a2b Release: v3.2.0 2020-09-22 17:36:51 +02:00
01f0fd0bab Fixes for LayoutLM (#7318) 2020-09-22 10:37:11 -04:00
702a76ff92 Create an XLA parameter and fix the mixed precision (#7311)
* Create an XLA parameter and fix mixed precision creation

* Fix issue brought by intellisense

* Complete docstring
2020-09-22 10:19:34 -04:00
596342c2b9 Support for Windows in check_copies (#7316) 2020-09-22 10:17:48 -04:00
89edf504bf Add possibility to evaluate every epoch (#7302)
* Add possibility to evaluate every epoch

* Remove multitype arg

* Remove needless import

* Use a proper enum

* Apply suggestions from @LysandreJik

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* One else and formatting

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:52:29 -04:00
21ca148090 is_pretokenized -> is_split_into_words (#7236)
* is_pretokenized -> is_split_into_words

* Fix tests
2020-09-22 09:34:35 -04:00
324f361e91 Fix saving TF custom models (#7291)
* Fix #7277

* Apply style

* Add a full training pipeline test

* Apply style
2020-09-22 09:31:13 -04:00
cd9a0585ea Add LayoutLM Model (#7064)
* first version

* finish test docs readme model/config/tokenization class

* apply make style and make quality

* fix layoutlm GitHub link

* fix conflict in index.rst and add layoutlm to pretrained_models.rst

* fix bug in test_parents_and_children_in_mappings

* reformat modeling_auto.py and tokenization_auto.py

* fix bug in test_modeling_layoutlm.py

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/layoutlm.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove inh, add tokenizer fast, and update some doc

* copy and rename necessary class from modeling_bert to modeling_layoutlm

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add mish to activations.py, import ACT2FN and import logging from utils

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 09:28:02 -04:00
244e1b5ba3 Fix #7304 (#7305) 2020-09-22 09:20:03 -04:00
e46108817e Adds FSMT to LM head AutoModel (#7312) 2020-09-22 06:35:51 -04:00
e2964b8a19 [fsmt] no need to pass device (#7292) 2020-09-22 05:39:06 -04:00
e4b94d8e58 Copy code from Bert to Roberta and add safeguard script (#7219)
* Copy code from Bert to Roberta and add safeguard script

* Fix docstring

* Comment code

* Formatting

* Update src/transformers/modeling_roberta.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add test and fix bugs

* Fix style and make new comand

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-22 05:02:27 -04:00
656c27c3a3 [s2s] save hostname with repo info (#7301)
* save hostname
2020-09-21 17:26:24 -04:00
34a1b75f01 Added RobBERT-v2 model card (#7286)
* Added RobBERT-v2 model card

* minor Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-21 16:17:28 -04:00
6513d16a48 IXAmBERT model card (#7283)
This PR includes the model card for the IXAmBERT model which has been recently uploaded to the huggingface repository.
2020-09-21 16:15:31 -04:00
af4b98ed97 [s2s] adjust finetune + test to work with fsmt (#7263) 2020-09-21 15:13:19 -04:00
8d562a2d1a [s2s] s/alpha_loss_encoder/alpha_encoder_loss/ (#7298)
fix to match `distillation.py:        self.alpha_encoder_loss`
2020-09-21 14:14:26 -04:00
cbb2f75a16 [s2s tests] fix test_run_eval_search (#7297) 2020-09-21 14:00:40 -04:00
7a88ed6c2a [model card] distlbart-mnli model cards (#7278) 2020-09-21 12:26:18 -04:00
63276b76d4 Fix #7284 (#7289) 2020-09-21 10:31:26 -04:00
8d464374ba Disable missing weight warning (#7282) 2020-09-21 09:14:48 -04:00
8ff88d25e9 [fsmt] rewrite SinusoidalPositionalEmbedding + USE_CUDA test fixes + new TranslationPipeline test (#7224)
* fix USE_CUDA, add pipeline

* USE_CUDA fix

* recode SinusoidalPositionalEmbedding into nn.Embedding subclass

was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained

* back out (ci debug)

* restore

* slow last?

* facilitate not saving certain keys and test

* remove no longer used keys

* style

* fix logging import

* cleanup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* fix bug in max_positional_embeddings

* rename keys to keys_to_never_save per suggestion, improve the setup

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-21 09:13:35 -04:00
67c4b0c517 Add model cards for new pre-trained BERTweet-COVID19 models (#7269)
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a  corpus of 23M COVID-19 English Tweets for 40 epochs.
2020-09-21 06:12:51 -04:00
0cbe1139b1 Update README.md 2020-09-21 11:53:08 +02:00
aae4edb5f0 Addressing review comment 2020-09-21 11:37:00 +02:00
43b9d93875 [example/glue] fix compute_metrics_fn for bart like models (#7248)
* fix compute_metrics_fn

* p.predictions -> preds

* apply suggestions
2020-09-21 05:34:20 -04:00
39062d05f0 Fixed target_mapping preparation for XLNet when batch size > 1 (incl. beam search) (#7267) 2020-09-21 04:53:52 -04:00
4b3e55bdcc Add "Fine-tune ALBERT for sentence-pair classification" notebook to the community notebooks (#7255) 2020-09-21 04:25:22 -04:00
7cbf0f722d examples/seq2seq/__init__.py mutates sys.path (#7194) 2020-09-20 16:54:42 -04:00
a4faeceaed Fix typo in model name (#7268) 2020-09-20 19:12:30 +02:00
47ab3e8262 @slow has to be last (#7251)
Found an issue when `@slow` isn't the last decorator (gets ignored!), so documenting this significance.
2020-09-20 09:17:29 -04:00
4f6e525742 model card improvements (#7221) 2020-09-19 17:02:05 -04:00
eb074af75e fsmt tiny model card + script (#7244) 2020-09-19 14:37:12 -04:00
1d90d0f386 Add title to model card (#7240) 2020-09-19 02:10:45 -04:00
c9b7ef042f Create README.md (#7239) 2020-09-19 02:09:29 -04:00
83dba10b8f [s2s] distributed_eval.py saves better speed info (#7242) 2020-09-18 15:46:01 -04:00
af2322c7a0 Add new pre-trained models BERTweet and PhoBERT (#6129)
* Add BERTweet and PhoBERT models

* Update modeling_auto.py

Re-add `bart` to LM_MAPPING

* Update tokenization_auto.py

Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`

* Add BERTweet and PhoBERT to pretrained_models.rst

* Update tokenization_auto.py

Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.

* Update BertweetTokenizer - without nltk

* Update model card for BERTweet

* PhoBERT - with Auto mode - without import fastBPE

* PhoBERT - with Auto mode - without import fastBPE

* BERTweet - with Auto mode - without import fastBPE

* Add PhoBERT and BERTweet to TF modeling auto

* Improve Docstrings for PhobertTokenizer and BertweetTokenizer

* Update PhoBERT and BERTweet model cards

* Fixed a merge conflict in tokenization_auto

* Used black to reformat BERTweet- and PhoBERT-related files

* Used isort to reformat BERTweet- and PhoBERT-related files

* Reformatted BERTweet- and PhoBERT-related files based on flake8

* Updated test files

* Updated test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Updated tf test files

* Update commits from huggingface

* Delete unnecessary files

* Add tokenizers to auto and init files

* Add test files for tokenizers

* Revised model cards

* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files

* Revised test files

* Update orders of Phobert and Bertweet tokenizers in auto tokenization file
2020-09-18 13:16:43 -04:00
9397436ea5 Create README.md 2020-09-18 16:52:00 +02:00
7eeca4d399 Create README.md 2020-09-18 16:44:02 +02:00
31516c776a Update README.md 2020-09-18 16:37:14 +02:00
4c14669a78 Update README.md 2020-09-18 16:35:11 +02:00
3a03bab9db Fix a few countings (steps / epochs) in trainer_tf.py (#7175) 2020-09-18 09:28:56 -04:00
ee9eae4e06 token-classification: update url of GermEval 2014 dataset (#6571) 2020-09-18 06:18:06 -04:00
eef8d94d19 [model_cards]
We use ISO 639-1 cc @gentaiscool
2020-09-18 12:09:24 +02:00
afd6a9f827 Create README.md 2020-09-18 11:41:12 +02:00
9f1544b9e0 Create README.md 2020-09-18 11:37:20 +02:00
5c1d5ea667 Fixed typo in README (#7233) 2020-09-18 04:52:43 -04:00
7719ecd19f Fix a typo (#7225) 2020-09-18 04:23:33 -04:00
4a26e8ac5f Create README.md (#7205) 2020-09-18 03:24:30 -04:00
94320c5b81 Add customized text to widget (#7204) 2020-09-18 03:24:23 -04:00
3aefb24b20 Create README.md (#7209) 2020-09-18 03:24:10 -04:00
a22e7a8dd4 Create README.md (#7210) 2020-09-18 03:23:58 -04:00
c028b26481 Create README.md (#7212) 2020-09-18 03:23:49 -04:00
c7cdd7b4fd Create README.md for indobert-lite-base-p1 (#7182) 2020-09-18 03:22:32 -04:00
bfb9150b8f Create README.md for indobert-lite-large-p1 (#7184)
* Create README.md

* Update README.md
2020-09-18 03:22:11 -04:00
d193593403 Create README.md (#7183) 2020-09-18 03:21:54 -04:00
e65d846674 Create README.md (#7185) 2020-09-18 03:21:39 -04:00
e27d86d48d Create README.md for indobert-large-p2 model card (#7181) 2020-09-18 03:21:28 -04:00
881c0783e9 Create README.md for indobert-large-p1 model card (#7180) 2020-09-18 03:21:16 -04:00
e0d58a5c87 Create README.md (#7179) 2020-09-18 03:20:59 -04:00
1313a1d2a8 Create README.md for indobert-base-p2 (#7178) 2020-09-18 03:20:29 -04:00
cf24f43e76 Create README.md (#7095)
Create model card for Pegasus QA
2020-09-18 03:19:45 -04:00
67d9fc50d9 [s2s] remove double assert (#7223) 2020-09-17 18:32:31 -04:00
edbaad2c5c [model cards] fix metadata - 3rd attempt (#7218) 2020-09-17 16:57:06 -04:00
999a1c957a skip failing FSMT CUDA tests until investigated (#7220) 2020-09-17 16:53:14 -04:00
51c4adf54c [model cards] fix dataset yaml (#7216) 2020-09-17 15:29:39 -04:00
a5638b2b3a [s2s] dynamic batch size with --max_tokens_per_batch (#7030) 2020-09-17 15:19:34 -04:00
efeab6a3f1 [s2s] run_eval/run_eval_search tweaks (#7192)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-17 14:26:38 -04:00
9c5bcab5b0 [model cards] fix yaml in cards (#7207) 2020-09-17 14:11:17 -04:00
e643a29722 Change to use relative imports in some files & Add python prompt symbols to example codes (#7202)
* Move 'from transformers' statements to relative imports in some files

* Add python prompt symbols in front of the example codes

* Reformat the code

* Add one missing space

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-17 12:30:45 -04:00
0fe6e435b6 [model cards] ported allenai Deep Encoder, Shallow Decoder models (#7153)
* [model cards] ported allenai Deep Encoder, Shallow Decoder models

* typo

* fix references

* add allenai/wmt19-de-en-6-6 model cards

* fill-in the missing info for the build script as provided by the searcher.
2020-09-17 17:58:49 +02:00
1eeb206bef [ported model] FSMT (FairSeq MachineTranslation) (#6940)
* ready for PR

* cleanup

* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST

* fix

* perfectionism

* revert change from another PR

* odd, already committed this one

* non-interactive upload workaround

* backup the failed experiment

* store langs in config

* workaround for localizing model path

* doc clean up as in https://github.com/huggingface/transformers/pull/6956

* style

* back out debug mode

* document: run_eval.py --num_beams 10

* remove unneeded constant

* typo

* re-use bart's Attention

* re-use EncoderLayer, DecoderLayer from bart

* refactor

* send to cuda and fp16

* cleanup

* revert (moved to another PR)

* better error message

* document run_eval --num_beams

* solve the problem of tokenizer finding the right files when model is local

* polish, remove hardcoded config

* add a note that the file is autogenerated to avoid losing changes

* prep for org change, remove unneeded code

* switch to model4.pt, update scores

* s/python/bash/

* missing init (but doesn't impact the finetuned model)

* cleanup

* major refactor (reuse-bart)

* new model, new expected weights

* cleanup

* cleanup

* full link

* fix model type

* merge porting notes

* style

* cleanup

* have to create a DecoderConfig object to handle vocab_size properly

* doc fix

* add note (not a public class)

* parametrize

* - add bleu scores integration tests

* skip test if sacrebleu is not installed

* cache heavy models/tokenizers

* some tweaks

* remove tokens that aren't used

* more purging

* simplify code

* switch to using decoder_start_token_id

* add doc

* Revert "major refactor (reuse-bart)"

This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.

* decouple from bart

* remove unused code #1

* remove unused code #2

* remove unused code #3

* update instructions

* clean up

* move bleu eval to examples

* check import only once

* move data+gen script into files

* reuse via import

* take less space

* add prepare_seq2seq_batch (auto-tested)

* cleanup

* recode test to use json instead of yaml

* ignore keys not needed

* use the new -y in transformers-cli upload -y

* [xlm tok] config dict: fix str into int to match definition (#7034)

* [s2s] --eval_max_generate_length (#7018)

* Fix CI with change of name of nlp (#7054)

* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last

* extending to support allen_nlp wmt models

- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models

* sync with changes

* s/fsmt-wmt/wmt/ in model names

* s/fsmt-wmt/wmt/ in model names (p2)

* s/fsmt-wmt/wmt/ in model names (p3)

* switch to a better checkpoint

* typo

* make non-optional args such - adjust tests where possible or skip when there is no other choice

* consistency

* style

* adjust header

* cards moved (model rename)

* use best custom hparams

* update info

* remove old cards

* cleanup

* s/stas/facebook/

* update scores

* s/allen_nlp/allenai/

* url maps aren't needed

* typo

* move all the doc / build /eval generators to their own scripts

* cleanup

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix indent

* duplicated line

* style

* use the correct add_start_docstrings

* oops

* resizing can't be done with the core approach, due to 2 dicts

* check that the arg is a list

* style

* style

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 11:31:29 -04:00
492bb6aa48 Trainer multi label (#7191)
* Trainer accep multiple labels

* Missing import

* Fix dosctrings
2020-09-17 08:15:37 -04:00
709745927b Transformer-XL: Remove unused parameters (#7087)
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL

 * Some changes are still to be done

* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)

 * Removed comments
 * Fixed quality

* Changed warning to info
2020-09-17 06:10:34 -04:00
c183d81e27 added multilabel text classification notebook using distilbert to community notebooks (#7201)
* added multilabel classification using distilbert notebook to community notebooks

* added multilabel classification using distilbert notebook to community notebooks
2020-09-17 05:58:57 -04:00
79111b77d2 remove deprecated flag (#7171)
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
  "W0501: The following deprecated CLI flags were used and ignored: "
```
2020-09-17 05:52:12 -04:00
0cdafbf7ec remove duplicated code (#7173) 2020-09-17 05:51:40 -04:00
45b0b1ff2f [s2s] fix kwarg typo (#7196) 2020-09-16 21:58:57 -04:00
0203ad43bc [s2s] distributed eval cleanup (#7186) 2020-09-16 15:38:37 -04:00
3babef815c Formatting 2020-09-16 14:57:09 -04:00
42049b8e12 use the correct add_start_docstrings (#7174) 2020-09-16 14:40:35 -04:00
fdaf8ab349 [s2s run_eval] new features (#7109)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-16 13:59:57 -04:00
df165065c3 [model_cards] antoiloui/belgpt2 🇧🇪 (#7166)
* Create README.md

* Update README.md
2020-09-16 12:16:01 -04:00
108c9aefcc Update README (#7133)
* Rewrite and update README

* Typo and migration guide

* Apply suggestions from code review

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Clem's comments

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-09-16 12:12:12 -04:00
9e376e156a Add condition (#7161) 2020-09-16 09:15:10 -04:00
f8590c56e6 [doc] improve/expand the Parametrization section (#7156) 2020-09-16 08:45:50 -04:00
d3391c87fe build/eval/gen-card scripts for fsmt (#7155)
* build/eval/gen-card scripts for fsmt

* adjust for model renames
2020-09-16 08:41:26 -04:00
08bfc1718a fix the warning message of overflowed sequence (#7151) 2020-09-16 07:40:57 -04:00
af8425b749 Refactoring the TF activations functions (#7150)
* Refactoring the activations functions into a common file

* Apply style

* remove unused import

* fix tests

* Fix tests.
2020-09-16 07:03:47 -04:00
b00cafbde5 [docs] add testing documentation (#7101)
* [docs] add testing documentation

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks as suggested

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweaks

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/testing.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more tweaks

* suggestions from @LysandreJik

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-15 19:25:25 -04:00
85ffda96fc fix encoder decoder kwargs (#7131) 2020-09-15 21:10:07 +02:00
4c62c6021a fix ZeroDivisionError and epoch counting (#7125)
* fix ZeroDivisionError and epoch counting

* Add test for num_train_epochs calculation in trainer.py

* Remove @require_non_multigpu for test_num_train_epochs_in_training
2020-09-15 11:51:50 -04:00
7af2791d77 Create README.md 2020-09-15 16:47:36 +02:00
153ec2f154 Funnel model cards (#7147) 2020-09-15 10:40:57 -04:00
7186ca6240 Multi predictions trainer (#7126)
* Allow multiple outputs

* Formatting

* Move the unwrapping before metrics

* Fix typo

* Add test for non-supported config options
2020-09-15 10:27:24 -04:00
52d250f6aa [model_cards] pvl/labse_bert model card
From **Language-Agnostic BERT Sentence Embedding**

https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html
2020-09-15 08:54:12 -04:00
84d64805b0 Create README.md (#7097)
Model card for PEGASUS finetuned for paraphrasing task
2020-09-15 08:48:25 -04:00
52bb7ccce5 German electra model card v3 update (#7089)
* changed eval table model order

* Update install

* update mc
2020-09-15 08:48:13 -04:00
1a85299a5e Tiny typo fix (#7143) 2020-09-15 08:18:42 -04:00
e29c3f1b11 Add quotes to paths in MeCab arguments (#7142)
Without quotes directories with spaces in them will fail to be processed
correctly.
2020-09-15 19:04:50 +08:00
cb061e78e1 Fix TF Trainer loss calculation (#6998)
* create branch for issue #6968

* First attempt to fix incorrect tf trainer loss calculation

* Fix training loss in metric

* fix tf trainer evaluation loss

* apply count_instances_in_batch() for eval and test datasets

* prototype of using a new argument in trainer_tf.py to fix loss issue

* some renaming and fix, in particular for evaluation methods

* fix bugs to have a running version

* change to @staticmethod

* apply style
2020-09-15 05:41:00 -04:00
b0cbcdb05b [logging] remove no longer needed verbosity override (#7100) 2020-09-15 04:01:14 -04:00
2bf70e2150 Fix reproducible tests in Trainer (#7119)
* Fix reproducible tests in Trainer

* Deal with multiple GPUs
2020-09-15 03:32:44 -04:00
9e89390ce1 [QOL] add signature for prepare_seq2seq_batch (#7108) 2020-09-14 20:33:08 -04:00
33d479d2b2 [s2s] distributed eval in one command (#7124) 2020-09-14 15:57:56 -04:00
206b78d485 Pin version of TF and torch 2020-09-14 14:08:51 -04:00
90cde2e938 Add Mirror Option for Downloads (#6679)
* Add Tuna Mirror for Downloads from China

* format fix

* Use preset instead of hardcoding URL

* Fix

* make style

* update the mirror option doc

* update the mirror
2020-09-14 23:50:22 +08:00
e0e0675ac7 Demoing LXMERT with raw images by incorporating the FRCNN model for roi-pooled extraction and bounding-box predction on the GQA answer set. (#6986)
* adding demo

* Update examples/lxmert/requirements.txt

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update examples/lxmert/checkpoint.sh

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* added user input for .py demo

* updated model loading, data extrtaction, checkpoints, and lots of other automation

* adding normalizing for bounding boxes

* Update requirements.txt

* some optimizations for extracting data

* added data extracting file

* added data extraction file

* minor fixes to reqs and readme

* Style

* remove options

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-14 10:07:04 -04:00
5636cbb25d Extra ) 2020-09-14 09:37:55 -04:00
ccc8e30c8a Clean up autoclass doc (#7081) 2020-09-14 09:26:41 -04:00
3ca1874ca4 [examples testing] restore code (#7099)
For some reason https://github.com/huggingface/transformers/pull/5512 re-added temp dir creation code that was removed by
https://github.com/huggingface/transformers/pull/6494 defeating the purpose of that PR for those tests.
2020-09-14 08:54:23 -04:00
4d39148419 fix deprecation warnings (#7033)
* fix deprecation warnings

* remove tests/test_tokenization_common.py's test_padding_to_max_length

* revert test_padding_to_max_length
2020-09-14 07:51:19 -04:00
576eec98e0 ignore FutureWarning in tests (#7079) 2020-09-14 07:50:51 -04:00
15d18e0307 fix link to paper (#7116) 2020-09-14 07:43:40 -04:00
bb3106f741 Temporarily skip failing tests due to dependency change (#7118)
* Temporarily skip failing tests due to dependency change

* Remove trace
2020-09-14 07:42:13 -04:00
0fab39695a [s2s distill] allow pegasus-12-12 (#7104) 2020-09-14 00:03:59 -04:00
de9e297964 [s2s] distributed eval cleanup (#7110) 2020-09-13 23:40:38 -04:00
54395d87a6 Update xsum length penalty to better values (#7107) 2020-09-13 20:48:47 -04:00
e7f8d2ab64 [s2s] two stage run_distributed_eval.py (#7105) 2020-09-13 17:28:18 -04:00
0ec63afec2 fix bug in pegasus converter (#7094) 2020-09-13 15:11:47 -04:00
b76cb1c3df [s2s] run_eval supports --prefix clarg. (#6953) 2020-09-12 01:08:21 -04:00
563ffb3dc3 Create README.md (#7066) 2020-09-11 15:21:05 -04:00
1ad49cde3a Create README.md (#7067) 2020-09-11 15:20:54 -04:00
4753816e39 added bangla-bert-base model card and also modified other model cards (#7071)
* added bangla-bert-base

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-11 15:17:25 -04:00
0a8c17d53c [T5Tokenizer] remove prefix_tokens (#7078) 2020-09-11 14:18:45 -04:00
4cbd50e611 Compute loss method (#7074) 2020-09-11 12:06:31 -04:00
ae736163d0 Add tests and fix various bugs in ModelOutput (#7073)
* Add tests and fix various bugs in ModelOutput

* Update tests/test_model_output.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-09-11 12:01:33 -04:00
e841b75dec Automate the lists in auto-xxx docs (#7061)
* More readable dict

* More nlp -> datasets

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* More readable dict

* Revert "More nlp -> datasets"

This reverts commit 3cd1883d226c63c4a686fc1fed35f2cd586ebe45.

* Automate the lists in auto-xxx docs

* nlp -> datasets

* Fix new key
2020-09-11 10:42:09 -04:00
0054a48cdd Add dep on datasets (#7058) 2020-09-11 04:43:19 -04:00
221d4c63a3 clean naming (#7068) 2020-09-11 09:57:53 +02:00
8fcbe486e1 these tests require non-multigpu env (#7059)
* these tests require non-multigpu env

* cleanup

* clarify
2020-09-10 18:52:55 -04:00
77950c485a [wip/s2s] DistributedSortishSampler (#7056) 2020-09-10 15:23:44 -04:00
514486739c Fix CI with change of name of nlp (#7054)
* nlp -> datasets

* More nlp -> datasets

* Woopsie

* More nlp -> datasets

* One last
2020-09-10 14:51:08 -04:00
e9a2f772bc [s2s] --eval_max_generate_length (#7018) 2020-09-10 14:11:34 -04:00
df4594a9da [xlm tok] config dict: fix str into int to match definition (#7034) 2020-09-10 19:31:01 +02:00
d6c08b07a0 [AutoTokenizer] Correct error message 2020-09-10 17:19:01 +02:00
db38f7ce29 [BertGeneration, Docs] Fix another old name in docs (#7050)
* correct docs for bert generation

* upload
2020-09-10 17:12:33 +02:00
3bd95b0faf correct docs for bert generation (#7048) 2020-09-10 17:08:40 +02:00
eb2feb5d90 Create README.md 2020-09-10 17:05:50 +02:00
66a5a6fda8 fix to ensure that returned tensors after the tokenization is Long (#7039)
* fix to ensure that returned tensors after the tokenization is Long

* fix to ensure that returned tensors after the tokenization is Long

Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
2020-09-10 11:04:03 -04:00
9ccdb1d517 Update README.md 2020-09-10 17:01:19 +02:00
60698936fc Create README.md 2020-09-10 17:00:10 +02:00
e0c3bc8ee0 Create README.md 2020-09-10 16:51:15 +02:00
c356b9878d Create README.md 2020-09-10 16:45:44 +02:00
5afd3f6196 Create README.md 2020-09-10 16:44:47 +02:00
15a189049e Add TF Funnel Transformer (#7029)
* Add TF Funnel Transformer

* Proper dummy input

* Formatting

* Update src/transformers/modeling_tf_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* One review comment forgotten

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-10 10:41:56 -04:00
7fd1febf38 Add "Leveraging Pretrained Checkpoints for Generation" Seq2Seq models. (#6594)
* add conversion script

* improve conversion script

* make style

* add tryout files

* fix

* update

* add causal bert

* better names

* add tokenizer file as well

* finish causal_bert

* fix small bugs

* improve generate

* change naming

* renaming

* renaming

* renaming

* remove leftover files

* clean files

* add fix tokenizer

* finalize

* correct slow test

* update docs

* small fixes

* fix link

* adapt check repo

* apply sams and sylvains recommendations

* fix import

* implement Lysandres recommendations

* fix logger warn
2020-09-10 16:40:51 +02:00
d1691d90e5 Samell fixed in tf template (#7044) 2020-09-10 10:36:02 -04:00
63e539459d Update README.md 2020-09-10 16:34:28 +02:00
054db06b1b Create README.md 2020-09-10 16:30:46 +02:00
b482ad474a Fix template (#7040) 2020-09-10 08:45:52 -04:00
762cba3bda Albert pretrain datasets/ datacollator (#6168)
* add dataset for albert pretrain

* datacollator for albert pretrain

* naming, comprehension, file reading change

* data cleaning is no needed after this modification

* delete prints

* fix a bug

* file structure change

* add tests for albert datacollator

* remove random seed

* add back len and get item function

* sample file for testing and test code added

* format change for black

* more format change

* Style

* var assignment issue resolve

* add back wrongly deleted DataCollatorWithPadding in init file

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-09-10 07:56:29 -04:00
49e9be0639 Fix confusing warnings during TF2 import from PyTorch (#6623)
1. Swapped missing_keys and unexpected_keys.

2. Copy&paste error caused these warnings to say "from TF 2.0" when it's actually "from PyTorch".
2020-09-10 05:31:59 -04:00
4ee1053dcf add -y to bypass prompt for transformers-cli upload (#7035) 2020-09-10 04:58:29 -04:00
76818cc4c6 Create README.md 2020-09-09 16:26:35 +02:00
15478c1287 Batch encore plus and overflowing tokens fails when non existing overflowing tokens for a sequence (#6677)
* Patch and test

* Fix tests
2020-09-09 06:55:17 -04:00
9fd11bf1a8 replace torch.triu with onnx compatible code (#6929) 2020-09-09 04:56:40 -04:00
ed71c21d6a [from_pretrained] Allow tokenizer_type ≠ model_type (#6995) 2020-09-09 04:22:59 -04:00
03e363f9ae [generation] consistently add eos tokens (#6982)
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.

This PR makes the output consistent.

Also why not also replace:

```
            if sent_lengths[i] < max_length:
                decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
            decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.

Please correct me if my logic is flawed.
2020-09-09 04:08:36 -04:00
d0963486c1 adding TRANSFORMERS_VERBOSITY env var (#6961)
* introduce TRANSFORMERS_VERBOSITY env var + test + test helpers

* cleanup

* remove helper function
2020-09-09 04:08:01 -04:00
f0fc0aea6b pegasus.rst: fix expected output (#7017) 2020-09-08 13:29:16 -04:00
120176ea29 [Longformer] Fix longformer documentation (#7016)
* fix longformer

* allow position ids to not be initialized
2020-09-08 18:51:28 +02:00
5c4eb4b1ac Fixing FLOPS merge by checking if torch is available (#7013)
* Should check if `torch` is available

* fixed samples_count error, distributed_concat arguments

* style

* Import torch at beginning of file

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-09-08 10:51:58 -04:00
01d340adfa Floating-point operations logging in trainer (#6768)
* neFLOs calculation, logging, and reloading (#1)

* testing distributed consecutive batches

* fixed AttributeError from DataParallel

* removed verbosity

* rotate with use_mtime=True

* removed print

* fixed interaction with gradient accumulation

* indent formatting

* distributed neflo counting

* fixed typo

* fixed typo

* mean distributed losses

* exporting log history

* moved a few functions

* floating_point_ops clarification for transformers with parameter-reuse

* code quality

* double import

* made flo estimation more task-agnostic

* only logging flos if computed

* code quality

* unused import

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain review

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* black

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-08 10:00:56 -04:00
d155b38d6e Funnel transformer (#6908)
* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Initial model

* Fix upsampling

* Add special cls token id and test

* Formatting

* Test and fist FunnelTokenizerFast

* Common tests

* Fix the check_repo script and document Funnel

* Doc fixes

* Add all models

* Write doc

* Fix test

* Fix copyright

* Forgot some layers can be repeated

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/modeling_funnel.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

* Update src/transformers/modeling_funnel.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Slow integration test

* Make small integration test

* Formatting

* Add checkpoint and separate classification head

* Formatting

* Expand list, fix link and add in pretrained models

* Styling

* Add the model in all summaries

* Typo fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-09-08 08:08:08 -04:00
25afb4ea50 fixed trainer tr_loss memory leak (#6999)
* fixed trainer tr_loss memory leak

* detached returned training loss from computation graph in the Trainer class' training_step() method

* Revert "fixed trainer tr_loss memory leak"

This reverts commit 47226e4e
2020-09-08 08:07:33 -04:00
1b76936d1a Fix typo (#6994) 2020-09-08 04:22:57 -04:00
8235426ee8 New Community NB "Fine tune GPT-2 with Trainer class" (#7005) 2020-09-08 03:42:20 -04:00
c18f5916a0 typo (#7001)
apologies for the tiny PRs, just sending those as I find them.
2020-09-08 01:22:20 -04:00
60fc03290b README for HooshvareLab/bert-fa-base-uncased (#6990)
ParsBERT v2.0 is a fine-tuned and vocab-reconstructed version of ParsBERT, and it's able to be used in other scopes!

It includes these features:
- We added some unused-vocab for use in summarization and other scopes.
- We fine-tuned the model on vast styles of writing in the Persian language.
2020-09-07 16:43:50 -04:00
90ec78b514 Add missing arguments for BertWordPieceTokenizer (#5810) 2020-09-07 08:35:41 -04:00
77cd0e13d2 Conversion scripts shouldn't have relative imports (#6991) 2020-09-07 08:31:06 -04:00
1650130b0f Remove misleading docstring 2020-09-07 14:16:59 +02:00
159ef07e4c match CI's version of flake8 (#6941)
my flake8 wasn't up-to-date enough `make quality` wasn't reporting the same things CI did - this PR adds the actual required version.

Thinking more about some of these minimal versions - CI will always install afresh and thus will always run the latest version. Is there a way to tell pip to always install the latest versions of certain dependencies on `pip install -i ".[dev]"`, rather than hardcoding the minimals which quickly become outdated?
2020-09-07 08:12:25 -04:00
e9d0d4c75c Create README.md (#6974) 2020-09-07 07:31:22 -04:00
848fbe1e35 [gen utils] missing else case (#6980)
* [gen utils] missing else case

1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)

* typo
2020-09-07 07:28:06 -04:00
f7e80721eb Fixed the default number of attention heads in Reformer Configuration (#6973) 2020-09-07 12:12:22 +02:00
e20d8895bd Create README.md model card (#6964)
* Create README.md

* Add some custom prompts

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-07 06:01:40 -04:00
b4a9c95f1b [testing] add dependency: parametrize (#6958)
unittest doesn't support pytest's super-handy `@pytest.mark.parametrize`, I researched and there are many proposed workarounds, most tedious at best. If we include https://pypi.org/project/parameterized/ in dev dependencies - it will provide a very easy to write parameterization in tests. Same as pytest's fixture, plus quite a few other ways. 

Example:
```
from parameterized import parameterized
@parameterized([
    (2, 2, 4),
    (2, 3, 8),
    (1, 9, 1),
    (0, 9, 0),
])
def test_pow(base, exponent, expected):
   assert_equal(math.pow(base, exponent), expected)
```
(extra `self`var if inside a test class)

To remind the pytest style is slightly different:
```
    @pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
    def test_eval(test_input, expected):
```
More examples here: https://pypi.org/project/parameterized

May I suggest that it will make it much easier to write some types of tests?
2020-09-07 05:50:18 -04:00
acfaad74ab [docstring] missing arg (#6933)
* [docstring] missing arg

add the missing `tie_word_embeddings` entry

* cleanup

* Update src/transformers/configuration_reformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 05:36:16 -04:00
c3317e1f80 typo (#6959)
there is no var `decoder_input_ids`, but there is `input_ids` for decoder :)
2020-09-07 05:16:24 -04:00
10c6f94adc [model_card] register jplu/tf-xlm-r-ner-40-lang as multilingual 2020-09-07 05:03:40 -04:00
9ef9c39728 Cannot index None (#6984) 2020-09-07 04:56:08 -04:00
08de989a0a Trainer with grad accum (#6930)
* Add warning for gradient accumulation

* Formatting
2020-09-07 04:54:00 -04:00
d4aa7284c8 [model_card] jplu/tf-xlm-r-ner-40-lang: Fix link
cc @jplu
2020-09-07 04:33:15 -04:00
995a958dd1 feat: allow prefix for any generative model (#5885)
* feat: allow padding_text for any generative model

* docs(pipelines.py): correct typo

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* feat: rename padding_text to prefix

* fix: cannot tokenize empty text

* fix: pass prefix arg to pipeline

* test: add prefix to text-generetation pipeline

* style: fix style

* style: clean code and variable name more explicit

* set arg docstring to optional

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-07 03:03:45 -04:00
ce37be9d94 [s2s] warn if --fp16 for torch 1.6 (#6977) 2020-09-06 20:41:29 -04:00
f72fe1f31a Correct wrong spacing in README 2020-09-06 13:26:56 +02:00
d31031f603 create model card for astroGPT (#6960)
* create model card for astroGPT

* Hotlink to actual image file

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-05 12:50:19 -04:00
56742e9f61 Create Readme.MD for KanBERTo (#6942)
* Create Readme.MD for KanBERTo

KanBERTo language model readme for Kannada language.

* Update model_cards/Naveen-k/KanBERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-04 18:24:32 -04:00
48ff6d5109 [doc] remove the implied defaults to :obj:None, s/True/ :obj:`True/, etc. (#6956)
* remove the implied defaults to :obj:`None`

* fix bug in the original

* replace to :obj:`True`, :obj:`False`
2020-09-04 18:22:25 -04:00
eff274d629 typo (#6952) 2020-09-04 16:14:37 -04:00
a4fc0c80b1 [s2s] run_eval.py parses generate_kwargs (#6948) 2020-09-04 14:19:31 -04:00
6078b12098 [s2s] distill: --normalize_hidden --supervise_forward (#6834) 2020-09-04 14:05:56 -04:00
c5d43a872f [docstring] misc arg doc corrections (#6932)
* correct bool types

fix docstring s/int/bool/

* fix description

* fix num_labels to match reality
2020-09-04 10:09:42 -04:00
e3990d137a fix (#6946) 2020-09-04 16:08:54 +02:00
a75e319819 Fix mixed precision issue in TF DistilBert (#6915)
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert

* fix style

* fix gelu dtype issue in TF Distilbert

* fix numeric overflow while using half precision
2020-09-04 14:29:57 +02:00
e95d262f25 [s2s] support early stopping based on loss, rather than rouge (#6927) 2020-09-03 17:31:35 -04:00
207ed8cb78 [s2s] use --eval_beams command line arg (#6926) 2020-09-03 12:42:09 -04:00
0f360d3d1c move wandb/comet logger init to train() to allow parallel logging (#6850)
* move wandb/comet logger init to train() to allow parallel logging

* Setup wandb/comet loggers on first call to log()
2020-09-03 11:49:14 -04:00
39ed68d597 [s2s] allow task_specific_params=summarization_xsum (#6923) 2020-09-03 11:11:40 -04:00
5a318f075a [s2s]: script to convert pl checkpoints to hf checkpoints (#6911)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 09:47:00 -04:00
b8e4906c97 tweak tar command in readme (#6919) 2020-09-03 09:29:01 -04:00
a66db7d828 Corrected link to paper (#6905) 2020-09-03 09:23:42 -04:00
55d61ce8d6 Added a link to the thesis. (#6906) 2020-09-03 09:20:03 -04:00
653a79ccad Loodos model cards had errors on "Usage" section. It is fixed. Also "electra-base-turkish-uncased" model removed from s3 and re-uploaded as "electra-base-turkish-uncased-discriminator". Its README added. (#6921)
Co-authored-by: Abdullah Oluk <abdullaholuk123@gmail.com>
2020-09-03 09:13:43 -04:00
5a3aec90a9 [model_card] link to correctly cased piaf dataset
cc @psorianom @rachelker
2020-09-03 08:57:32 -04:00
722b5807d8 Template updates (#6914) 2020-09-03 04:14:58 -04:00
ea2c6f1afc Adding the LXMERT pretraining model (MultiModal languageXvision) to HuggingFace's suite of models (#5793)
* added template files for LXMERT and competed the configuration_lxmert.py

* added modeling, tokization, testing, and finishing touched for lxmert [yet to be tested]

* added model card for lxmert

* cleaning up lxmert code

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_lxmert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* tested torch lxmert, changed documtention, updated outputs, and other small fixes

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/convert_pytorch_checkpoint_to_tf2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* renaming, other small issues, did not change TF code in this commit

* added lxmert question answering model in pytorch

* added capability to edit number of qa labels for lxmert

* made answer optional for lxmert question answering

* add option to return hidden_states for lxmert

* changed default qa labels for lxmert

* changed config archive path

* squshing 3 commits: merged UI + testing improvments + more UI and testing

* changed some variable names for lxmert

* TF LXMERT

* Various fixes to LXMERT

* Final touches to LXMERT

* AutoTokenizer order

* Add LXMERT to index.rst and README.md

* Merge commit test fixes + Style update

* TensorFlow 2.3.0 sequential model changes variable names

Remove inherited test

* Update src/transformers/modeling_tf_pytorch_utils.py

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/lxmert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_lxmert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added suggestions

* Fixes

* Final fixes for TF model

* Fix docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-09-03 04:02:25 -04:00
4ebb52afdb test_tf_common: remove un_used mixin class parameters (#6866) 2020-09-02 10:54:40 -04:00
e71f32c0ef [testing] fix ambiguous test (#6898)
Since `generate()` does:
```
        num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).

This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.

Thanks.
2020-09-02 16:18:17 +02:00
8f2723caf0 Output attention takes an s (#6903)
* Fix output_attention -> output_attentions

* Formatting

* One unsaved file
2020-09-02 08:11:45 -04:00
485da7222f fix error class instantiation (#6634) 2020-09-02 07:36:32 -04:00
4230d30f77 [pipelines] Text2TextGenerationPipeline (#6744)
* add Text2TextGenerationPipeline

* remove max length warning

* remove comments

* remove input_length

* fix typo

* add tests

* use TFAutoModelForSeq2SeqLM

* doc

* typo

* add the doc below TextGenerationPipeline

* doc nit

* style

* delete comment
2020-09-02 07:34:35 -04:00
6b24281229 fix typo in comments (#6838) 2020-09-02 06:55:37 -04:00
7351ef83c1 [doc] typos (#6867)
* [doc] typos

fixed typos

* Update README.md
2020-09-02 06:51:51 -04:00
ee1bff06f8 minor docs grammar fixes (#6889) 2020-09-02 06:45:19 -04:00
8abd7f69fc fix warning for position ids (#6884) 2020-09-02 06:44:51 -04:00
7cb0572c64 Update modeling_bert.py (#6897)
outptus -> outputs in example of BertForPreTraining
2020-09-02 06:39:01 -04:00
e3c55ceb8d Model card for huBERT (#6893)
* Create README.md

Model card for huBERT.

* Update README.md

lowercase h

* Update model_cards/SZTAKI-HLT/hubert-base-cc/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-02 04:50:10 -04:00
1889e96c8c fix QA example for PT (#6890) 2020-09-02 09:53:09 +02:00
d822ab636b [model_cards] Fix file path for flexudy/t5-base-multi-sentence-doctor 2020-09-02 00:02:40 +02:00
ad5fb33c9a Create README.md (#6598) 2020-09-01 17:59:15 -04:00
f9dadcd85b Create README.md (#6602) 2020-09-01 17:58:43 -04:00
f5d69c75f7 Update multilingual passage rereanking model card (#6788)
Fix range of possible score, add inference .
2020-09-01 17:56:19 -04:00
5d820f3ca6 Model card for primer/BART-Squad2 (#6801) 2020-09-01 17:52:32 -04:00
8b884dadc6 added model card for flexudys t5 model (#6759)
Co-authored-by: zolekode <pascal.zoleko@fau.de>
2020-09-01 17:38:55 -04:00
bff6d517cd loodos turkish model cards added (#6840) 2020-09-01 17:35:24 -04:00
502d194b95 Create README.md (#6887)
Add language meta attribute
2020-09-01 17:09:10 -04:00
d082edf216 Create README.md (#6888)
Add language meta attribute
2020-09-01 17:09:02 -04:00
dacbee9a50 Create README.md (#6886)
* Create README.md

model card for  akhooli/xlm-r-large-arabic-sent

* Update model_cards/akhooli/xlm-r-large-arabic-sent/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-09-01 17:06:15 -04:00
e2971e61bd Create README.md (#6885) 2020-09-01 16:57:48 -04:00
4d1a3ffde8 [EncoderDecoder] Add xlm-roberta to encoder decoder (#6878)
* finish xlm-roberta

* finish docs

* expose XLMRobertaForCausalLM
2020-09-01 21:56:39 +02:00
311992630c Create README.md (#6883)
* Create README.md

* Update README.md
2020-09-01 19:24:45 +02:00
21d719238c Add cache_dir to save features TextDataset (#6879)
* Add cache_dir to save features TextDataset

This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).

* style
2020-09-01 11:42:17 -04:00
1461aac8d7 Update docs stable version 2020-09-01 11:02:24 -04:00
3726754a6c v3.1.0 documentation 2020-09-01 14:39:07 +02:00
4b3ee9cbc5 Release: v3.1.0 2020-09-01 14:27:52 +02:00
afc4ece462 [Generate] Facilitate PyTorch generate using ModelOutputs (#6735)
* fix generate for GPT2 Double Head

* fix gpt2 double head model

* fix  bart / t5

* also add for no beam search

* fix no beam search

* fix encoder decoder

* simplify t5

* simplify t5

* fix t5 tests

* fix BART

* fix transfo-xl

* fix conflict

* integrating sylvains and sams comments

* fix tf past_decoder_key_values

* fix enc dec test
2020-09-01 12:38:25 +02:00
397f819615 Restore PaddingStrategy.MAX_LENGTH on QAPipeline while no v2. (#6875)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-09-01 05:35:35 -04:00
a32d85f0d4 delete reinit (#6862) 2020-09-01 03:43:27 -04:00
d5f1ffa0d8 Logging doc (#6852)
* Add logging doc

* Foamtting

* Update docs/source/main_classes/logging.rst

* Update src/transformers/utils/logging.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-01 03:16:34 -04:00
59a6a32a61 add a final report to all pytest jobs (#6861)
we had it added for one job, please add it to all pytest jobs - we need the output of what tests were run to debug the codecov issue. thank you!
2020-08-31 22:47:23 -04:00
431ab19d7a [fix] typo in available in helper function (#6859) 2020-08-31 17:59:34 -04:00
367235ee52 Bart can make decoder_input_ids from labels (#6758) 2020-08-31 16:16:47 -04:00
b9772897ec [s2s] command line args for faster val steps (#6833) 2020-08-31 16:16:10 -04:00
8af1970e45 Fix marian slow test (#6854) 2020-08-31 16:10:43 -04:00
bbdba0a76d Update ONNX notebook to include section on quantization. (#6831)
* Update ONNX notebook to include section on quantization.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing ONNX team comments
2020-08-31 21:28:00 +02:00
a59bcefbb1 Split hp search methods (#6857)
* Split the run_hp_search by backend

* Unused import
2020-08-31 15:16:39 -04:00
23f9611c16 Add checkpointing to Ray Tune HPO (#6747)
* Introduce HPO checkpointing for PBT

* Moved checkpoint saving

* Fixed checkpoint subdir pass

* Fixed style

* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT

* Adjust number of GPUs to number of jobs

* Avoid mode pickling in ray

* Move hp search to integrations
2020-08-31 14:38:46 -04:00
61b7ba93f5 Marian distill scripts + integration test (#6799) 2020-08-31 13:48:26 -04:00
02d09c8fcc Only access loss tensor every logging_steps (#6802)
* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* Fix style (#6803)

* t5 model should make decoder_attention_mask (#6800)

* [s2s] Test hub configs in self-scheduled CI (#6809)

* [s2s] round runtime in run_eval (#6798)

* Pegasus finetune script: add --adafactor (#6811)

* [bart] rename self-attention -> attention (#6708)

* [tests] fix typos in inputs (#6818)

* Fixed open in colab link (#6825)

* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)

* BR_BERTo model card (#6793)

* clearly indicate shuffle=False (#6312)

* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* [s2s README] Add more dataset download instructions (#6737)

* Style

* Patch logging issue

* Set default logging level to `WARNING` instead of `INFO`

* TF Flaubert w/ pre-norm (#6841)

* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)

* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

* Fix in Adafactor docstrings (#6845)

* Fix resuming training for Windows (#6847)

* Only access loss tensor every logging_steps

* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards

* comments

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-31 11:35:51 -04:00
c48546c7f7 Fix resuming training for Windows (#6847) 2020-08-31 11:02:30 -04:00
d2f9cb838e Fix in Adafactor docstrings (#6845) 2020-08-31 10:52:47 -04:00
2de7ee0385 Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task

* bug fix (numbers of special tokens & truncate sequences)

* bug fix (+ dict inputs support for data collator)

* add padding for nsp data collator; renamed cached files to avoid conflict.

* add test for nsp data collator

* Style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-31 08:25:00 -04:00
895d394669 TF Flaubert w/ pre-norm (#6841) 2020-08-31 04:53:20 -04:00
4561f05c5f Set default logging level to WARNING instead of INFO 2020-08-31 09:56:25 +02:00
05c3214153 Patch logging issue 2020-08-31 09:37:08 +02:00
dfa10a41ba [s2s README] Add more dataset download instructions (#6737) 2020-08-30 16:29:24 -04:00
32fe44086c clearly indicate shuffle=False (#6312)
* Clarify shuffle

* clarify shuffle

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-30 19:26:10 +08:00
0eecaceac7 BR_BERTo model card (#6793) 2020-08-30 19:02:46 +08:00
d176aaad7f Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) 2020-08-30 18:21:49 +08:00
a5847619e3 Fixed open in colab link (#6825) 2020-08-30 18:21:00 +08:00
563485bf95 [tests] fix typos in inputs (#6818) 2020-08-30 18:19:57 +08:00
22933e661f [bart] rename self-attention -> attention (#6708) 2020-08-29 18:03:08 -04:00
0f58903bb6 Pegasus finetune script: add --adafactor (#6811) 2020-08-29 17:43:32 -04:00
ac47458a02 [s2s] round runtime in run_eval (#6798) 2020-08-29 17:36:31 -04:00
5ab21b072f [s2s] Test hub configs in self-scheduled CI (#6809) 2020-08-28 17:05:52 -04:00
3cac867fac t5 model should make decoder_attention_mask (#6800) 2020-08-28 15:22:33 -04:00
20f7786453 Fix style (#6803) 2020-08-28 15:02:25 -04:00
9336086ab5 prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654)
* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc
2020-08-28 11:15:17 -04:00
cb276b41de Transformer-XL: Improved tokenization with sacremoses (#6322)
* Improved tokenization with sacremoses

 * The TransfoXLTokenizer is now using sacremoses for tokenization
 * Added tokenization of comma-separated and floating point numbers.
 * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
 * Added corresponding tests
 * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
 * Added deprecation warning to TransfoXLTokenizerFast

* isort change

Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-28 09:56:17 -04:00
930153e7d2 Add ProtBert model card (#6764) 2020-08-28 12:12:28 +08:00
743d131d76 [style] set the minimal required version for black (#6784)
`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion
2020-08-28 11:38:09 +08:00
fb78a90d6a PL: --adafactor option (#6776) 2020-08-27 22:19:46 -04:00
92ac2fa7d1 [transformers-cli] fix logger getter (#6777) 2020-08-27 20:01:17 -04:00
42fddacd1c Format 2020-08-27 18:31:51 +02:00
70fccc5cf3 new Makefile target: docs (#6510)
* [doc] multiple corrections to "Summary of the tasks"

* add a new "docs" target to validate docs and document it

* fix mixup
2020-08-27 12:25:16 -04:00
dbfe34f2f5 [test schedulers] adjust to test the first step's reading (#6429)
* [test schedulers] small improvement

* cleanup
2020-08-27 12:23:28 -04:00
e6b811f0a7 [testing] replace hardcoded paths to allow running tests from anywhere (#6523)
* [testing] replace hardcoded paths to allow running tests from anywhere

* fix the merge conflict
2020-08-27 12:22:18 -04:00
9d1b4db2aa add nlp install (#6767) 2020-08-27 11:08:14 -04:00
c225e872ed Fix it to work with BART (#6756) 2020-08-27 09:04:50 -04:00
0d2c111a0c Format 2020-08-27 14:56:47 +02:00
6f289dc97a Fix the TF Trainer gradient accumulation and the TF NER example (#6713)
* Align TF NER example over the PT one

* Fix Dataset call

* Fix gradient accumulation training

* Apply style

* Address Sylvain's comments

* Address Sylvain's comments

* Apply style
2020-08-27 08:45:34 -04:00
41aa2b4ef1 Adafactor docs (#6765) 2020-08-27 05:16:50 -04:00
971d1802d0 Add AdaFactor optimizer from fairseq (#6722)
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.

* update PR fixes, add basic test

* bug -- incorrect params in test

* bugfix -- import Adafactor into test

* bugfix -- removed accidental T5 include

* resetting T5 to master

* bugfix -- include Adafactor in __init__

* longer loop for adafactor test

* remove double error class declare

* lint

* black

* isort

* Update src/transformers/optimization.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* single docstring

* Cleanup docstring

Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-27 04:58:13 -04:00
4bd7be9a42 s2s distillation uses AutoModelForSeqToSeqLM (#6761) 2020-08-26 23:25:11 -04:00
05e7150a53 create ProtBert-BFD model card. (#6724) 2020-08-27 02:19:19 +02:00
61518e2df3 [s2s] run_eval.py QOL improvements and cleanup(#6746) 2020-08-26 18:59:20 -04:00
434936f34a Model Card for Multilingual Passage Reranking BERT (#6755) 2020-08-26 18:00:27 -04:00
10a34501f1 add __init__.py to utils (#6754) 2020-08-26 23:51:10 +02:00
61b9ed8074 Model card for kuisailab/albert-large-arabic (#6730)
* Create README.md

* Update README.md
2020-08-26 17:27:56 -04:00
8e0d51e4f2 Model card for kuisailab/albert-xlarge-arabic (#6731)
* Create README.md

* Update README.md
2020-08-26 17:27:42 -04:00
70c96a10e9 Model card for kuisailab/albert-base-arabic (#6729)
* Create README.md

* Update README.md
2020-08-26 17:27:34 -04:00
cc4ba79f68 added model card for codeswitch-spaeng-sentiment-analysis-lince (#6727)
* added model card for codeswitch-spaeng-sentiment-analysis-lince model also update other model card

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* fixed typo

* Update README.md
2020-08-26 17:26:32 -04:00
e10fb9cbe6 Create model card for lordtt13/COVID-SciBERT (#6718) 2020-08-26 17:22:25 -04:00
baeba53e88 Adding model cards for 5 models (#6703)
* Added model cards for 4 models

Added model cards for:
- roberta-base-bulgarian
- roberta-base-bulgarian-pos
- roberta-small-bulgarian
- roberta-small-bulgarian-pos

* fixed link text

* Update README.md

* Create README.md

* removed trailing bracket

* Add language metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-26 17:20:55 -04:00
3242e4d942 [model_cards] Fix tiny typos 2020-08-26 23:16:06 +02:00
99407f9d1e add xlm-roberta-large-xnli model card (#6723)
* add xlm-roberta-large-xnli model card

* update pt example

* typo
2020-08-26 16:05:59 -04:00
858b7d5873 [TF Longformer] Improve Speed for TF Longformer (#6447)
* add tf graph compile tests

* fix conflict

* remove more tf transpose statements

* fix conflicts

* fix comment typos

* move function to class function

* fix black

* fix black

* make style
2020-08-26 14:55:41 -04:00
a75c64d80c Black 20 release 2020-08-26 17:20:22 +02:00
e78c110338 isort 5 2020-08-26 17:13:49 +02:00
02e8cd5584 Fix optimizer (#6717) 2020-08-26 11:12:44 -04:00
77abd1e79f Centralize logging (#6434)
* Logging

* Style

* hf_logging > utils.logging

* Address @thomwolf's comments

* Update test

* Update src/transformers/benchmark/benchmark_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Revert bad change

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-26 11:10:36 -04:00
461ae86812 Fix tf boolean mask in graph mode (#6741) 2020-08-26 05:15:35 -04:00
925f34bbbd Add "tie_word_embeddings" config param (#6692)
* add tie_word_embeddings

* correct word embeddings in modeling utils

* make style

* make config param only relevant for torch

* make style

* correct typo

* delete deprecated arg in transo-xl
2020-08-26 04:58:21 -04:00
fa8ee8e855 fix torchscript docs (#6740) 2020-08-26 04:51:56 -04:00
64c7c2bc15 Install nlp for github actions test (#6728) 2020-08-25 14:58:38 -04:00
624495706c T5Tokenizer adds EOS token if not already added (#5866)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 14:56:08 -04:00
e11d923bfc Fix pegasus-xsum integration test (#6726) 2020-08-25 14:06:28 -04:00
7e6397a7d8 [squad] make examples and dataset accessible from SquadDataset object (#6710)
* [squad] make examples and dataset accessible from SquadDataset object

* [squad] add support for legacy cache files
2020-08-25 13:32:56 -04:00
ac9702c284 Fix ONNX test_quantize unittest (#6716) 2020-08-25 13:24:40 -04:00
074340339a Create README.md (#6721)
add model card for singbert large
2020-08-26 00:11:24 +08:00
d17cce2270 add missing keys (#6719) 2020-08-25 11:38:51 -04:00
a25c9fc8e1 Selected typo fix (#6687) 2020-08-25 15:39:02 +02:00
625318f525 tensor.nonzero() is deprecated in PyTorch 1.6 (#6715)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-25 08:12:54 -04:00
124c3d6adc Add tokenizer to Trainer (#6689) 2020-08-25 07:47:09 -04:00
abc0202194 More tests to Trainer (#6699)
* More tests to Trainer

* Add warning in the doc
2020-08-25 07:07:36 -04:00
f5bad031bc Use generators tqdm progressbars (#6696) 2020-08-25 07:06:58 -04:00
a99d09c6f9 add new line to make examples run (#6706) 2020-08-25 06:26:29 -04:00
4db2fa77d7 Allow tests in examples to use cuda or fp16,if they are available (#5512)
* Allow tests in examples to use cuda or fp16,if they are available

The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
  the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
  which made the test to work without cuda. This example is having issue when running with fp16
  thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
  difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.

Resolves some of: #5057

* Unwanted import of is_apex_available was removed

* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.

* Incorrectly sorted imports fixed

* The model needs to be converted to half precision

* Formatted single line if condition statement to multiline

* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
  similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
  True by default, setting USE_CUDA to False should result in the examples not using CUDA

* Format some of the code in test_examples file

* The improper import of is_apex_available was sorted

* Formatted the code to keep the style standards

* The comma at the end of list giving a flake8 issue was fixed

* Import sort was fixed

* Removed the clean_test_dir function as its not used right now
2020-08-25 06:02:07 -04:00
841f071569 Add typing.overload for convert_ids_tokens (#6637)
* add overload for type checker

* black
2020-08-25 04:57:08 -04:00
0f16dd0ac2 Add DPR to models summary (#6690)
* add dpr to models summary

* minor

* minor

* Update docs/source/model_summary.rst

qa -> question answering

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_summary.rst

qa -> question ansering (cont'd)

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-25 09:57:28 +02:00
Jay
4fca874ea9 Remove hard-coded uses of float32 to fix mixed precision use (#6648) 2020-08-25 15:42:32 +08:00
0344428f79 [s2s] round bleu, rouge to 4 digits (#6704) 2020-08-25 00:33:11 -04:00
b6512d2357 Add model card for singbert. (#6674)
* Add model card for singbert.

Adding a model card for singbert- bert for singlish and manglish.

* Update README.md

Add additional tags and model name.

* Update README.md

Fix tag for malay.

* Update model_cards/zanelim/singbert/README.md

Fix language

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Add examples and custom widget input.

Add examples and custom widget input.

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 10:09:13 +08:00
d20cbb886b Fix hyperparameter_search doc (#6695) 2020-08-24 21:04:08 -04:00
0ebc9699fa [fixdoc] Add import to pegasus usage doc (#6698) 2020-08-24 15:54:57 -04:00
6b4c617666 Move unused args to kwargs (#6694) 2020-08-24 13:20:03 -04:00
912a21ec78 remove BartForConditionalGeneration.generate (#6659)
As suggested here: https://github.com/huggingface/transformers/issues/6651#issuecomment-678594233
this removes generic `generate` doc with examples not-relevant to bart.
2020-08-25 00:42:34 +08:00
a8d6716ecb Create PULL_REQUEST_TEMPLATE.md (#6660)
* Create PULL_REQUEST_TEMPLATE.md

Proposing to copy this neat feature from pytorch. This is a small template that let's a PR submitter tell which issue that PR closes.

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-25 00:30:38 +08:00
8f98faf934 Lat fix for Ray HP search (#6691) 2020-08-24 12:15:00 -04:00
3a7fdd3f52 Add hyperparameter search to Trainer (#6576)
* Add optuna hyperparameter search to Trainer

* @julien-c suggestions

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Make compute_objective an arg function

* Formatting

* Rework to make it easier to add ray

* Formatting

* Initial support for Ray

* Formatting

* Polish and finalize

* Add trial id to checkpoint with Ray

* Smaller default

* Use GPU in ray if available

* Formatting

* Fix test

* Update install instruction

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

* Address review comments

* Formatting post-merge

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
2020-08-24 11:48:45 -04:00
dd522da004 Fix PL token classification examples (#6682) 2020-08-24 11:30:06 -04:00
a573777901 Update repo to isort v5 (#6686)
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
2020-08-24 11:03:01 -04:00
d329c9b05d Fixed DataCollatorForLanguageModeling not accepting lists of lists (#6685)
* Fixed DataCollatorForLanguageModeling + PermutationLanguageModeling not accepting lists of lists

* Update data_collator.py

* black was grumpy
2020-08-24 15:31:44 +02:00
0a850d210e Missing commit 2020-08-24 09:23:06 -04:00
b30879fe0c Don't reset the dataset type + plug for rm unused columns (#6683)
* Don't reset the type of the dataset

* Formatting

* Update trainer.py

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-08-24 09:22:03 -04:00
1a779ad7ec Specify config filename (#6626) 2020-08-24 07:27:58 -04:00
a622705ef3 added multiple model_cards for below models (#6666)
* Create README.md

* Update README.md

* Create README.md

* Update README.md

* added multiple codeswitch model
2020-08-24 05:08:32 -04:00
16e38940bd Add Roberta2Roberta shared 2020-08-23 17:02:22 +02:00
f230a64094 new paper bibtex (#6656) 2020-08-23 10:03:41 -04:00
f235ee2164 Add Roberta2Roberta model card 2020-08-23 10:01:58 +02:00
068df740bd added model_card for model codeswitch-hineng-lid-lince and codeswitch-spaeng-lid-lince (#6663)
* Create README.md

* Update README.md

* Create README.md

* Update README.md
2020-08-22 12:13:21 -04:00
97bb2497ab Correct bug in bert2bert-cnn_dailymail
Model was trained with the wrong tokenizer. Retrained with correct tokenizer - thanks for spotting @lhoestq !
2020-08-22 13:44:20 +02:00
0f94151dc7 Add model card for electricidad-base-generator (#6650)
I works like a charm!
Look at the output of the example code!
2020-08-21 14:18:15 -04:00
cbda72932c [Doc model summary] add MBart model summary (#6649) 2020-08-21 13:42:59 -04:00
9e8c494da7 Add T5-11B disclaimer
@julien-c
2020-08-21 18:11:18 +02:00
a4db4e3032 [Docs model summaries] Add pegasus to docs (#6640)
* add pegasus to docs

* Update docs/source/model_summary.rst
2020-08-21 16:22:10 +02:00
d0e42a7bed CamembertForCausalLM (#6577)
* added CamembertForCausalLM

* add in __init__ and auto model

* style

* doc
2020-08-21 13:52:54 +02:00
bdf7e5de92 Remove accidental comment (#6629) 2020-08-21 05:07:32 -04:00
efc7460553 model card for Spanish electra base (#6633) 2020-08-21 05:04:29 -04:00
b105f2c6b3 Update ONNX doc to match the removal of --optimize argument.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-21 10:37:09 +02:00
e5f452275b Trainer automatically drops unused columns in nlp datasets (#6449)
* Add a classmethod to easily build a Trainer from nlp dataset and metric

* Fix docstrings

* Split train/eval

* Formatting

* Log dropped columns + docs

* Authorize callable activations

* Poc for auto activation

* Be framework-agnostic

* Formatting

* Remove class method

* Remove unnecessary code
2020-08-20 16:29:14 -04:00
5bf4465e6c Regression test for pegasus bugfix (#6606) 2020-08-20 15:34:43 -04:00
86c07e634f One last threshold to raise 2020-08-20 14:23:09 -04:00
e8af90c052 Move threshold up for flaky test with Electra (#6622)
* Move threshold up for flaky test with Electra

* Update above as well
2020-08-20 13:59:40 -04:00
953958372a XLNet Bug when training with apex 16-bit precision (#6567)
* xlnet fp16 bug fix

* comment cast added

* Update modeling_xlnet.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-21 01:34:23 +08:00
505f2d749e [Tests] fix attention masks in Tests (#6621)
* fix distilbert

* fix typo
2020-08-20 13:23:47 -04:00
c9454507cf Add tests for Reformer tokenizer (#6485) 2020-08-20 18:58:44 +02:00
f9d280a959 TFTrainer dataset doc & fix evaluation bug (#6618)
* TFTrainer dataset doc & fix evaluation bug

discussed in #6551

* add docstring to test/eval datasets
2020-08-20 12:11:36 -04:00
573bdb0a5d Add tests to Trainer (#6605)
* Add tests to Trainer

* Test if removing long breaks everything

* Remove ugly hack

* Fix distributed test

* Use float for number of epochs
2020-08-20 11:13:50 -04:00
039d8d65fc add intro to nlp lib & dataset links to custom datasets tutorial (#6583)
* add intro to nlp lib + links

* unique links...
2020-08-20 10:32:51 -04:00
b3e54698dd Fix CI 2020-08-20 08:34:02 -04:00
33bf426498 removed redundant arg in prepare_inputs (#6614)
* removed redundant arg in prepare_inputs

* made same change in prediction_loop
2020-08-20 08:23:35 -04:00
cabfdfafc0 Docs copy button misses ... prefixed code (#6518)
Tested in a local build of the docs.

e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling

Copy will copy the full code, e.g.

for token in top_5_tokens:
     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

Instead of currently only:

for token in top_5_tokens:


>>> for token in top_5_tokens:
...     print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.

Docs for the option fix:
https://sphinx-copybutton.readthedocs.io/en/latest/
2020-08-20 17:35:06 +08:00
61b5ee11e3 lighter 'make test' (#6512) 2020-08-20 17:24:25 +08:00
3c3c46f563 Typo fix in 04-onnx-export (#6595) 2020-08-20 16:17:16 +08:00
93c5c9a528 [cleanup] remove confusing newline (#6603) 2020-08-20 00:33:36 -04:00
18ca0e9140 Fix #6575 (#6596) 2020-08-19 13:04:33 -04:00
7581884dee [BartTokenizerFast] add prepare_seq2seq_batch (#6543) 2020-08-19 10:37:48 -04:00
8bcceaceff fix model outputs test (#6593) 2020-08-19 16:18:51 +02:00
9a86321b11 tf generation utils: remove unused kwargs (#6591) 2020-08-19 09:37:45 -04:00
2a7402cbd3 Feed forward chunking others (#6365)
* Feed forward chunking for Distilbert & Albert

* Added ff chunking for many other models

* Change model signature

* Added chunking for XLM

* Cleaned up by removing some variables.

* remove test_chunking flag

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-19 14:31:10 +02:00
fe0b85e77a [EncoderDecoder] Add functionality to tie encoder decoder weights (#6538)
* start adding tie encoder to decoder functionality

* finish model tying

* make style

* Apply suggestions from code review

* fix t5 list including cross attention

* apply sams suggestions

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add max depth break point

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-19 14:23:45 +02:00
ab42d74850 Fix bart base test (#6587) 2020-08-18 21:28:10 -04:00
1529bf9680 add BartConfig.force_bos_token_to_be_generated (#6526)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-18 19:15:50 -04:00
974bb4af26 [Model card] Bert2GPT2 EncoderDecoder model (#6569)
* Bert2GPT2 EncoderDecoder model

* Update README.md
2020-08-18 19:28:17 +02:00
6f972e1423 update xnli-mt url (#6580) 2020-08-18 13:10:47 -04:00
fb6844aff5 [Pegasus Doc] minor typo (#6579)
Minor typo correction
@sshleifer
2020-08-18 12:47:47 -04:00
aaab9ab187 Create README.md (#6556) 2020-08-18 12:43:20 -04:00
1dfce0f08a Create README.md (#6557) 2020-08-18 12:42:14 -04:00
7516bcf273 [docs] Fix number of 'ug' occurrences in tokenizer_summary (#6574) 2020-08-18 10:23:25 -04:00
5a5af22ed5 [docs] Fix wrong newline in the middle of a paragraph (#6573) 2020-08-18 10:22:43 -04:00
7659a8eb37 fix incorrect codecov reports (#6553)
As discussed at https://github.com/huggingface/transformers/issues/6317 codecov currently sends an invalid report when it fails to find a code coverage report for the base it checks against, so this gets fixed by:

-  require_base: yes        # don't report if there is no base coverage report

let's add this for clarity, this supposedly is already the default.

-  require_head: yes        # don't report if there is no head coverage report 

and perhaps no point reporting on doc changes as they don't make any difference and it just generates noise:

-  require_changes: true    # only comment if there was change in coverage
2020-08-18 10:21:13 -04:00
cfa26d2b41 github: add @stefan-it to bug-report template for all token-classification related bugs (#6489) 2020-08-18 08:38:54 -04:00
1fdf372f8c Small typo fixes for model card: electra-base-german-uncased (#6555)
* Update README.md

* Update model_cards/german-nlp-group/electra-base-german-uncased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-18 08:21:52 -04:00
5a81195ea9 Fixed label datatype for STS-B (#6492)
* fixed label datatype for sts-b

* naming update

* make style

* make style
2020-08-18 08:09:39 -04:00
12d7624199 [marian] converter supports models from new Tatoeba project (#6342) 2020-08-17 23:55:42 -04:00
fb7330b30e update with #s of sentences/tokens (#6546) 2020-08-17 16:48:05 -04:00
63144701ed Added first model card (#6530)
* Added first model card

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:24:10 -04:00
98ee802023 [model_cards] Add model cards for Urduhack model (roberta-urdu-small) (#6536)
* [model_cards] roberta-urdu-small added.

* [model_cards] typo fixed.

* Tweak license format (yaml expects a simple string)

Co-authored-by: Ikram Ali <mrikram1989>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-17 16:04:29 -04:00
3a302904cb [model_cards] Add a new model for Irish (#6544) 2020-08-17 15:56:56 -04:00
07971d8b18 [model_cards] Fix yaml for cedpsam/chatbot_fr 2020-08-17 21:33:32 +02:00
407da12ef1 [T5Tokenizer] add prepare_seq2seq_batch method (#6122)
* tests
2020-08-17 13:57:19 -04:00
c9564f5343 [Doc] add more MBart and other doc (#6490)
* add mbart example

* add Pegasus and MBart in readme

* typo

* add MBart in Pretrained models

* add pre-proc doc

* add DPR in readme

* fix indent

* doc fix
2020-08-17 12:30:26 -04:00
f68c873100 replace _ with __ rst links (#6541) 2020-08-17 12:27:02 -04:00
7ca6ab67fc Fix CI 2020-08-17 12:20:40 -04:00
b732e7e111 [doc] multiple corrections to "Summary of the tasks" (#6509)
* [doc] multiple corrections to "Summary of the tasks"

* fix indentation

* correction

* fix links, add links to examples/seq2seq/README.md instead of non-existing script
2020-08-17 11:49:16 -04:00
2a77813d53 [BartTokenizer] add prepare s2s batch (#6212)
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-08-17 11:44:46 -04:00
84d33317ae [doc] make the text more readable, fix some typos, add some disambiguation (#6508)
* [doc] make the text more readable, fix some typos, add some disambiguation

* Update docs/source/glossary.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 11:07:58 -04:00
d0c2389f48 add custom datasets tutorial (#6466)
* add custom datasets tutorial

* python -> bash code blocks

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor review feedback changes

* add working native QA snippet

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-17 09:15:34 -04:00
d2da2cb232 allow spaces in bash args with "$@" (#6521) 2020-08-17 09:06:35 -04:00
b41cc0b86a Fix flaky ONNX tests (#6531) 2020-08-17 09:04:35 -04:00
39c3b1d9de [sched] polynomial_decay_schedule use default power=1.0 (#6473) 2020-08-17 08:33:12 -04:00
9dbe4094f2 [testing] a new TestCasePlus subclass + get_auto_remove_tmp_dir() (#6494)
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs

* respect after=True for tempfile, simplify code

* comments

* comment fix

* put `before` last in args, so can make debug even faster
2020-08-17 08:12:19 -04:00
36010cb1e2 fix pegasus doc (#6533) 2020-08-17 12:24:43 +02:00
37709b5909 Remove deprecated assertEquals (#6532)
`assertEquals` is deprecated: https://stackoverflow.com/questions/930995/assertequals-vs-assertequal-in-python/931011
This PR replaces these deprecated methods.
2020-08-17 17:13:58 +08:00
49d8076fa2 [doc] Summary of the models fixes (#6511)
* [doc] Summary of the models fixes

* correction
2020-08-17 16:04:53 +08:00
72911c893a Create model cards for indonesian models (#6522)
* added model cards for indonesian gpt2-small, bert-base and roberta-base models

* removed bibtex entries
2020-08-17 15:42:25 +08:00
48c6c6139f Support additional dictionaries for BERT Japanese tokenizers (#6515)
* Update BERT Japanese tokenizers

* Update CircleCI config to download unidic

* Specify to use the latest dictionary packages
2020-08-17 12:00:23 +08:00
423eb5b1d7 [doc] fix invalid env vars (#6504)
- remove invalid `ENV_` prefix.
- add a few ':' while at it
2020-08-17 11:11:40 +08:00
3c72f5584b Add Model Card for electra-base-german-uncased (#6496)
* Add Model Card for electra-base-german-uncased

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-17 11:02:32 +08:00
df15c7c226 typos (#6505) 2020-08-17 10:57:36 +08:00
6d38ab1cc3 Update bert-base-portuguese-cased and bert-large-portuguese-cased model cards (#6527)
Co-authored-by: Fabio Souza <fabiosouza@neuralmind.ai>
2020-08-17 10:49:49 +08:00
84c265ffcc [lightning_base] fix s2s logging, only make train_loader once (#6404) 2020-08-16 22:49:41 -04:00
72add6c98f [s2s] docs, document desired filenames nicely (#6525) 2020-08-16 20:31:22 -04:00
2060181126 Fixes paths with spaces in seq2seq example (#6493) 2020-08-16 13:36:38 -04:00
fe61c05b85 Add examples/bert-loses-patience who can help (#6499) 2020-08-16 16:30:16 +08:00
24107c2c83 Fix TPU Convergence bug introduced by PR#6151 (#6488)
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
2020-08-14 12:47:37 -04:00
895ed8f451 Generation doc (#6470)
* Generation doc

* MBartForConditionalGeneration (#6441)

* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions

* Use hash to clean the test dirs (#6475)

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix

* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)

* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sort unique_no_split_tokens to make it deterministic (#6461)

* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style

* Import accuracy_score (#6480)

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

* Generation doc

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comments

* Styling

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-14 09:46:39 -04:00
b5ba758ba9 Import accuracy_score (#6480) 2020-08-14 08:16:16 -04:00
9a8c168f56 Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set

* use sorted list instead of set

* style
2020-08-14 10:36:58 +02:00
1d6e71e116 [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2

* make gpt2 cross attention work

* finish bert2gpt2

* add explicit comments

* remove attention mask since not yet supported

* revert attn mask in pipeline

* Update src/transformers/modeling_gpt2.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_encoder_decoder.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-14 09:43:29 +02:00
eb613b566a Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs

* Use hash to clean the test dirs

* Use hash to clean the test dirs

* fix
2020-08-14 15:34:39 +08:00
680f1337c3 MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration

* style

* rebase and fixes

* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS

* fix docs

* don't ignore mbart

* doc

* fix mbart fairseq link

* put mbart before bart

* apply doc suggestions
2020-08-14 03:21:16 -04:00
05810cd80a Fix typo (#6469) 2020-08-13 15:01:08 -04:00
7bc00569df Clean directory after script testing (#6453)
* Clean Dir after testing

* remove pabee ignore
2020-08-14 00:34:03 +08:00
e92efcf728 Mult rouge by 100: standard units (#6359) 2020-08-13 12:15:54 -04:00
eda07efaa5 Add POS tagging and Phrase chunking token classification examples (#6457)
* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)
2020-08-13 12:09:51 -04:00
f51161e230 add BartTokenizerFast in AutoTokenizer (#6464)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-13 12:08:11 -04:00
a442f87adc add LongformerTokenizerFast in AutoTokenizer (#6463) 2020-08-13 12:06:43 -04:00
f7cbc13db7 Test model outputs equivalence (#6445)
* Test model outputs equivalence

* Fix failing tests

* From dict to kwargs

* DistilBERT

* Addressing @sgugger and @patrickvonplaten's comments
2020-08-13 11:59:35 -04:00
54c687e97c typo fix (#6462) 2020-08-13 09:36:48 -04:00
9d94aecd51 Fix docs and bad word tokens generation_utils.py (#6387)
* fix

* fix2

* fix3
2020-08-13 13:12:16 +02:00
0ed7c00ba6 Update README.md (#6435)
* Update README.md

* Update README.md

* Update README.md
2020-08-13 11:01:17 +02:00
e983da0e7d cleanup tf unittests: part 2 (#6260)
* cleanup torch unittests: part 2

* remove trailing comma added by isort, and which breaks flake

* one more comma

* revert odd balls

* part 3: odd cases

* more ["key"] -> .key refactoring

* .numpy() is not needed

* more unncessary .numpy() removed

* more simplification
2020-08-13 04:29:06 -04:00
bc820476a5 add targets arg to fill-mask pipeline (#6239)
* add targets arg to fill-mask pipeline

* add tests and more error handling

* quality

* update docstring
2020-08-12 12:48:29 -04:00
0735def8e1 [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer (#6411)
* add encoder-decoder for roberta

* fix headmask

* apply Sylvains suggestions

* fix typo

* Apply suggestions from code review
2020-08-12 18:23:30 +02:00
fd3de2000f Get GKE logs via kubectl logs instead of gcloud logging read. (#6446) 2020-08-12 11:46:24 -04:00
f94a52cd79 [s2s] add BartTranslationDistiller for distilling mBART (#6363) 2020-08-12 11:41:04 -04:00
d2370e1bd8 Adding PaddingDataCollator (#6442)
* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Data collator with padding

* Add type annotation

* Support tensors as well

* Add comment

* Fix for labels wrong shape

* Remove changes rendered unnecessary
2020-08-12 11:32:27 -04:00
96c3329f19 Fix #6428 (#6437) 2020-08-12 08:47:30 -04:00
a8db954cda Activate check on the CI (#6427)
* Activate check on the CI

* Fix repo inconsistencies

* Don't document too much
2020-08-12 08:42:14 -04:00
34fabe1697 Move prediction_loss_only to TrainingArguments (#6426) 2020-08-12 08:03:45 -04:00
e9c3031463 Fixes to make life easier with the nlp library (#6423)
* allow using tokenizer.pad as a collate_fn in pytorch

* allow using tokenizer.pad as a collate_fn in pytorch

* Add documentation and tests

* Make attention mask the right shape

* Better test

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-08-12 08:00:56 -04:00
87b359439f [test] replace capsys with the more refined CaptureStderr/CaptureStdout (#6422)
* replace capsys with the more refined CaptureStderr/CaptureStdout

* Update examples/seq2seq/test_seq2seq_examples.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-12 07:54:28 -04:00
ac5bcf236e Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttent… (#4323)
* Fix FFN dropout in TFAlbertLayer, and split dropout in TFAlbertAttention into two separate dropout layers.

* Same dropout fixes for PyTorch.
2020-08-12 07:52:42 -04:00
4ffea5ce2f Disabled pabee test (#6431) 2020-08-12 02:52:50 -04:00
155288f04b [model_card] rohanrajpal/bert-base-codemixed-uncased-sentiment (#6324)
* Create README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

* Update model_cards/rohanrajpal/bert-base-codemixed-uncased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:38:18 -04:00
4e6245fc7e Create model card T5-base fine-tuned on event2Mind for Intent Prediction (#6412) 2020-08-11 18:35:27 -04:00
46e3a0a6ec Create README.md (#6381) 2020-08-11 18:34:11 -04:00
31dfde7429 Create README.md (#6378) 2020-08-11 18:32:37 -04:00
25e29150a2 Add metadata to be indexed properly (#6380) 2020-08-11 18:32:29 -04:00
471be5f279 Change metadata to be indexed correctly (#6379) 2020-08-11 18:32:18 -04:00
42ee0bc63d Create README.md (#6346)
* Create README.md

* add results on SAIL dataset

* Update model_cards/rohanrajpal/bert-base-multilingual-codemixed-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 18:31:34 -04:00
3f071c4b6e [examples] add pytest dependency (#6425) 2020-08-11 17:58:09 -04:00
ece0903e11 lr_schedulers: add get_polynomial_decay_schedule_with_warmup (#6361)
* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* [model_cards] electra-base-turkish-cased-ner (#6350)

* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Temporarily de-activate TPU CI

* Update modeling_tf_utils.py (#6372)

fix typo: ckeckpoint->checkpoint

* the test now works again (#6371)

* correct pl link in readme (#6364)

* refactor almost identical tests (#6339)

* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt

* Small docfile fixes (#6328)

* Patch models (#6326)

* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo

* Ci GitHub caching (#6382)

* Cache Github Actions CI

* Remove useless file

* Colab button (#6389)

* Add colab button

* Add colab link for tutorials

* Fix links for open in colab (#6391)

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* [wip] add get_polynomial_decay_schedule_with_warmup

* style

* add assert

* change lr_end to a much smaller default number

* check for exact equality

* Update src/transformers/optimization.py

consistently use lr_end=1e-7 default

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove dup (leftover from merge)

* convert the test into the new refactored format

* stick to using the current_step as is, without ++

Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-11 17:56:41 -04:00
6c87b73d6b Create README.md (#6386)
* Create README.md

* Update README.md
2020-08-11 16:56:51 -04:00
0203d6517f [pl] restore lr logging behavior for glue, ner examples (#6314) 2020-08-11 16:27:11 -04:00
be1520d3a3 rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) 2020-08-11 15:57:07 -04:00
66fa8ceaea PegasusForConditionalGeneration (torch version) (#6340)
Co-authored-by: Jingqing  Zhang <jingqing.zhang15@imperial.ac.uk>
2020-08-11 14:31:23 -04:00
f6cb0f806e [s2s] wmt download script use less ram (#6405) 2020-08-11 12:04:17 -04:00
7c6a085ebf pl version: examples/requirements.txt is single source of truth (#6309) 2020-08-11 10:58:54 -04:00
1d1d5bec1b Create Model Card File (#6357) 2020-08-11 10:36:15 -04:00
00ce881c07 Create README.md (#6413)
* Create README.md

Model card for https://huggingface.co/akhooli/gpt2-small-arabic

* Update model_cards/akhooli/gpt2-small-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 10:35:31 -04:00
3ae30787b5 switch Hindi-BERT to S3 README (#6396) 2020-08-11 10:34:22 -04:00
824e651e17 Create README.md (#6397)
* Create README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

* Update model_cards/akhooli/gpt2-small-arabic-poetry/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-11 09:03:23 -04:00
404782912a [Performance improvement] "Bad tokens ids" optimization (#6064)
* Optimized banned token masking

* Avoid duplicate EOS masking if in bad_words_id

* Updated mask generation to handle empty banned token list

* Addition of unit tests for the updated bad_words_ids masking

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test

* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)

* Moving Marian import to the test context to allow TF only environments to run

* Moving imports to torch_available test

* Updated operations device and test

* Updated operations device and test

* Added docstring and comment for in-place scores modification

* Moving test to own test_generation_utils, use of lighter models for testing

* removed unneded imports in test_modeling_common

* revert formatting change for ModelTesterMixin

* Updated caching, simplified eos token id test, removed unnecessary @require_torch

* formatting compliance
2020-08-11 05:56:40 -04:00
87e124c245 Warn if debug requested without TPU fixes (#6308) (#6390)
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 05:31:26 -04:00
cdf1f7edb2 Fix tokenizer saving and loading error (#6026)
* fix tokenizer saving and loading bugs when adding AddedToken to additional special tokens

* Add tokenizer test

* Style

* Style 2

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-11 04:49:16 -04:00
83984a61c6 testing utils: capturing std streams context manager (#6231)
* testing utils: capturing std streams context manager

* style

* missing import

* add the origin of this code
2020-08-11 03:56:47 -04:00
f6c0680d36 add pl_glue example test (#6034)
* add pl_glue example test

* for now just test that it runs, next validate results of eval or predict?

* complete the run_pl_glue test to validate the actual outcome

* worked on my machine, CI gets less accuracy - trying higher epochs

* match run_pl.sh hparms

* more epochs?

* trying higher lr

* for now just test that the script runs to a completion

* correct the comment

* if cuda is available, add --fp16 --gpus=1 to cover more bases

* style
2020-08-11 03:16:52 -04:00
b25cec13c5 Feed forward chunking (#6024)
* Chunked feed forward for Bert

This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.

* Black and cleanup

* Feed forward chunking in BertLayer class.

* Isort

* add chunking for all models

* fix docs

* Fix typo

Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-08-11 03:12:45 -04:00
8a3db6b303 Add TPU testing once again 2020-08-11 08:49:37 +02:00
f65ac1faf2 Add missing docker arg for TPU CI. (#6393) 2020-08-11 02:48:49 -04:00
b9ecd92ee4 [s2s] Script to save wmt data to disk (#6403) 2020-08-10 22:49:39 -04:00
00bb0b25ed TF Longformer (#5764)
* improve names and tests longformer

* more and better tests for longformer

* add first tf test

* finalize tf basic op functions

* fix merge

* tf shape test passes

* narrow down discrepancies

* make longformer local attn tf work

* correct tf longformer

* add first global attn function

* add more global longformer func

* advance tf longformer

* finish global attn

* upload big model

* finish all tests

* correct false any statement

* fix common tests

* make all tests pass except keras save load

* fix some tests

* fix torch test import

* finish tests

* fix test

* fix torch tf tests

* add docs

* finish docs

* Update src/transformers/modeling_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply Lysandres suggestions

* reverse to assert statement because function will fail otherwise

* applying sylvains recommendations

* Update src/transformers/modeling_longformer.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Update src/transformers/modeling_tf_longformer.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-10 23:25:06 +02:00
3425936643 [EncoderDecoderModel] add a add_cross_attention boolean to config (#6377)
* correct encoder decoder model

* Apply suggestions from code review

* apply sylvains suggestions
2020-08-10 19:46:48 +02:00
06bc347c97 Fix links for open in colab (#6391) 2020-08-10 11:16:17 -04:00
3e0fe3cf5c Colab button (#6389)
* Add colab button

* Add colab link for tutorials
2020-08-10 11:12:29 -04:00
79588e6fdb Ci GitHub caching (#6382)
* Cache Github Actions CI

* Remove useless file
2020-08-10 10:39:31 -04:00
b99098abc7 Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}

* Patch models

* BERT and TF BERT info


s

* Update check_repo
2020-08-10 10:39:17 -04:00
6028ed92bd Small docfile fixes (#6328) 2020-08-10 05:37:12 -04:00
1429b920d4 refactor almost identical tests (#6339)
* refactor almost identical tests

* important to add a clear assert error message

* make the assert error even more descriptive than the original bt
2020-08-10 05:31:20 -04:00
35eb96de4d correct pl link in readme (#6364) 2020-08-10 03:08:46 -04:00
0830e79512 the test now works again (#6371) 2020-08-10 02:55:52 -04:00
3a556b0fb7 Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
2020-08-10 02:55:11 -04:00
1bbc54a87c Temporarily de-activate TPU CI 2020-08-10 08:11:40 +02:00
6e8a38568e [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner

* Add metadata

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-08-09 03:39:51 -04:00
9a5ef83748 [s2s] fix --gpus clarg collision (#6358) 2020-08-08 21:51:37 -04:00
1aec991643 [GPT2] Correct typo in docs (#6352) 2020-08-08 20:37:29 +02:00
9f57e39f71 Add notebook on fine-tuning and interpreting Electra (#6321)
Co-authored-by: eliska <3648991+elisans@users.noreply.github.com>
2020-08-08 11:47:33 +02:00
9bed355449 [s2s] fix label_smoothed_nll_loss (#6344) 2020-08-08 04:21:12 -04:00
99f73bcc71 [s2s] tiny QOL improvement: run_eval prints scores (#6341) 2020-08-08 02:45:55 -04:00
322dffc6c9 remove a TODO item to use a tiny model (#6338)
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
2020-08-07 21:30:39 -04:00
1f8e826518 [CI] Self-scheduled runner also pins torch (#6332) 2020-08-07 18:40:21 -04:00
1b8a7ffcfd Add setup for TPU CI to run every hour. (#6219)
* Add setup for TPU CI to run every hour.

* Re-organize config.yml

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-07 11:17:07 -04:00
6695450a23 [examples] consistently use --gpus, instead of --n_gpu (#6315) 2020-08-07 10:36:32 -04:00
0e36e51515 Fix the tests for Electra (#6284)
* Fix the tests for Electra

* Apply style
2020-08-07 09:30:57 -04:00
6ba540b747 Add a script to check all models are tested and documented (#6298)
* Add a script to check all models are tested and documented

* Apply suggestions from code review

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>

* Address comments

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-08-07 09:18:37 -04:00
e1638dce16 fix the slow tests doc (#6167)
remove unnecessary duplication wrt `RUN_SLOW=yes`
2020-08-07 09:17:32 -04:00
7e9861f7f4 dehate-bert Model Card (#6248)
Added citation and paper links.
2020-08-07 17:51:03 +08:00
f6df6d98dd dehate-bert Model Card (#6249)
Added citation and paper links.
2020-08-07 17:48:38 +08:00
26691ecba6 dehate-bert Model Card (#6250)
Added citation and paper links.
2020-08-07 17:48:09 +08:00
60657b295c dehate-bert Model Card (#6251)
Added citation and paper links.
2020-08-07 17:47:42 +08:00
7218261991 dehate-bert Model Card (#6252)
Added citation and paper links.
2020-08-07 17:47:26 +08:00
396d227cd4 dehate-bert Model Card (#6253)
Added citation and paper links.
2020-08-07 17:47:04 +08:00
8be260f18a dehate-bert Model Card (#6254)
Added citation and paper links.
2020-08-07 17:46:27 +08:00
dce7278cdf dehate-bert Model Card (#6255)
Added citation and paper links.
2020-08-07 17:45:52 +08:00
3be2d04884 fix consistency CrossEntropyLoss in modeling_bart (#6265) 2020-08-07 17:44:28 +08:00
c72f9c90a1 Remove --no-cache-dir from github CI 2020-08-07 09:07:22 +02:00
0d9328f2ef Patch GPU failures (#6281)
* Pin to 1.5.0

* Patch XLM GPU test
2020-08-07 02:58:15 -04:00
80a0676a51 CI dependency wheel caching (#6287)
* Single workflow cache test




Remove cache dir, re-trigger cache


Only pip archives


Not sudo when pip

* All workflow cache

Remove no-cache-dir instruction


Remove last sudo occurrences


v0.3
2020-08-07 02:48:59 -04:00
175cd45e13 fix the shuffle agrument usage and the default (#6307) 2020-08-06 20:32:28 -04:00
ffceef2042 [Fix] text-classification PL example (#6027)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-06 15:46:43 -04:00
eb2bd8d6eb Remove redundant line in run_pl_glue.py (#6305) 2020-08-06 15:43:45 -04:00
118ecfd427 fix for pytorch < 1.6 (#6300) 2020-08-06 21:14:46 +02:00
2804fff839 [s2s]Use prepare_translation_batch for Marian finetuning (#6293)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-06 14:58:38 -04:00
2f2aa0c89c added n_inner argument to gpt2 config (#6296) 2020-08-06 17:47:32 +02:00
0a0d53dcf8 Update model card (#6290)
Add links to RuPERTa models fine-tuned on Spanish SQUAD datasets
2020-08-06 11:42:43 -04:00
b923871bb7 Adds comet_ml to the list of auto-experiment loggers (#6176)
* Support for Comet.ml

* Need to import comet first

* Log this model, not the one in the backprop step

* Log args as hyperparameters; use framework to allow fine control

* Log hyperparameters with context

* Apply black formatting

* isort fix integrations

* isort fix __init__

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer_tf.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address review comments

* Style + Quality, remove Tensorboard import test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-06 11:31:30 -04:00
d5bc32ce92 Add strip_accents to basic BertTokenizer. (#6280)
* Add strip_accents to basic tokenizer

* Add tests for strip_accents.

* fix style with black

* Fix strip_accents test

* empty commit to trigger CI

* Improved strip_accents check

* Add code quality with is not False
2020-08-06 18:52:28 +08:00
31da35cc89 Create README.md (#6273)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:36:24 -04:00
a8bdba232f Create README.md for uploaded classifier (#6272)
I am adding a descriptive README.md file to my recently uploaded twitter classification model: shrugging-grace/tweetclassifier.
2020-08-05 12:27:46 -04:00
a23a535c10 added t5 bahasa summarization readme (#6269) 2020-08-05 12:27:27 -04:00
c67d1a0259 Tf model outputs (#6247)
* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* TF outputs and test on BERT

* Albert to DistilBert

* All remaining TF models except T5

* Documentation

* One file forgotten

* Add new models and fix issues

* Quality improvements

* Add T5

* A bit of cleanup

* Fix for slow tests

* Style
2020-08-05 11:34:39 -04:00
bd0eab351a Trainer + wandb quality of life logging tweaks (#6241)
* added `name` argument for wandb logging, also logging model config with trainer arguments

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* added tf, post-review changes

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-08-05 09:05:52 -04:00
33966811bd Add SequenceClassification and MultipleChoice TF models to Electra (#6227)
* Add SequenceClassification and MultipleChoice TF models to Electra

* Apply style

* Add summary_proj_to_labels to Electra config

* Finally mirroring the PT version of these models

* Apply style

* Fix Electra test
2020-08-05 09:04:27 -04:00
376c02e9a9 [WIP] lightning_base: support --lr_scheduler with multiple possibilities (#6232)
* support --lr_scheduler with multiple possibilities

* correct the error message

* add a note about supported schedulers

* cleanup

* cleanup2

* needs the argument default

* style

* add another assert in the test

* implement requested changes

* cleanups

* fix relative import

* cleanup
2020-08-05 09:01:17 -04:00
d89acd07cc fix (#6257) 2020-08-05 07:37:57 -04:00
24c5a6e351 Update optimization.py (#6261) 2020-08-05 07:34:57 -04:00
ed6b8f3128 Update to match renamed attributes in fairseq master (#5972)
* Update to match renamed attributes in fairseq master

RobertaModel no longer have model.encoder and args.num_classes attributes as of 5/28/20.

* Quality

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-08-05 07:23:55 -04:00
d9149f00d1 Update README.md (#6201) 2020-08-04 17:44:14 -04:00
ddfdbb86c1 Update README.md (#6200) 2020-08-04 17:44:05 -04:00
4f67955662 Update README.md (#6199) 2020-08-04 17:43:48 -04:00
869ec441c9 Update README.md (#6198) 2020-08-04 17:43:38 -04:00
5177dca634 Create README.md (#6123) 2020-08-04 17:42:53 -04:00
3f30ebe6ca Create README.md (#6075) 2020-08-04 17:41:23 -04:00
aa7c22a283 Update Model Card (#6246)
Added citation and paper links.
2020-08-04 17:40:47 -04:00
972535ea74 fix zero shot pipeline docs (#6245) 2020-08-04 16:37:49 -04:00
5920a37a4c Add license info to German Bert models (#6242)
* Add xlm-r QA model card

* Add tags

* Add license info to german bert
2020-08-04 13:40:49 -04:00
6c9ba1d8fc [Reformer] Make random seed generator available on random seed and not on model device (#6244)
* improve if else statement random seeds

* Apply suggestions from code review

* Update src/transformers/modeling_reformer.py
2020-08-04 13:22:43 -04:00
d5b0a0e235 mBART Conversion script (#6230) 2020-08-04 09:53:51 -04:00
268bf34630 typo (#6225) 2020-08-04 09:31:49 -04:00
7f65daa2e1 fix reformer fp16 (#6237) 2020-08-04 13:02:25 +02:00
7ea9b2db37 Encoder decoder config docs (#6195)
* Adding docs for how to load encoder_decoder pretrained model with individual config objects

* Adding docs for loading encoder_decoder config from pretrained folder

* Fixing  W293 blank line contains whitespace

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Update src/transformers/modeling_encoder_decoder.py

* Apply suggestions from code review

model file should only show examples for how to load save model

* Update src/transformers/configuration_encoder_decoder.py

* Update src/transformers/configuration_encoder_decoder.py

* fix space

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-08-04 09:23:28 +02:00
1d5c3a3d96 Test with --no-cache-dir (#6235) 2020-08-04 03:20:19 -04:00
6730ecdd3c Remove redundant coverage (#6224) 2020-08-04 02:59:21 -04:00
5deed37f9f cleanup torch unittests (#6196)
* improve unit tests

this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest

* batch 1

* batch 2

* batch 3

* batch 4

* batch 5

* style

* non-tf template

* last deletion of check_loss_output
2020-08-04 02:42:56 -04:00
b390a5672a Make the order of additional special tokens deterministic (#5704)
* Make the order of additional special tokens deterministic regardless of hash seeds

* Fix
2020-08-04 02:38:30 -04:00
d740351f7d Upgrade pip when doing CI (#6234)
* Upgrade pip when doing CI

* Don't forget Github CI
2020-08-04 02:37:12 -04:00
57eb1cb68d [s2s] Document better mbart finetuning command (#6229)
* Document better MT command

* improve multigpu command
2020-08-03 18:22:31 -04:00
0513f8d275 correct label extraction + add note on discrepancies on trained MNLI model and HANS (#6221) 2020-08-03 15:02:51 -04:00
3c289fb38c Remove outdated BERT tips (#6217)
* Remove out-dated BERT tips

* Update modeling_outputs.py

* Update bert.rst

* Update bert.rst
2020-08-04 01:17:56 +08:00
e4920c92d6 Doc pipelines (#6175)
* Init work on pipelines doc

* Work in progress

* Work in progress

* Doc pipelines

* Rm unwanted default

* Apply suggestions from code review

Lysandre comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-08-03 11:44:46 -04:00
b6b2f2270f s2s: fix LR logging, remove some dead code. (#6205) 2020-08-03 10:36:26 -04:00
06f1692b02 Fix _shift_right function in TFT5PreTrainedModel (#6214) 2020-08-03 16:21:23 +02:00
0b41867357 fix labels (#6213) 2020-08-03 10:19:35 -04:00
cedc547e7e Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict output for logging. (#5331)
* Adds train_batch_size, eval_batch_size, and n_gpu to to_sanitized_dict() output

* Update wandb config logging to use to_sanitized_dict

* removed n_gpu from sanitized dict

* fix quality check errors
2020-08-03 09:00:39 -04:00
9996f697e3 Fix saved model creation (#5468)
* Fix TF Serving when output_hidden_states and output_attentions are True

* Add tests for saved model creation + bug fix for multiple choices models

* remove unused import

* Fix the input for several layers

* Fix test

* Fix conflict printing

* Apply style

* Fix XLM and Flaubert for TensorFlow

* Apply style

* Fix TF check version

* Apply style

* Trigger CI
2020-08-03 08:10:40 -04:00
5a0dac53bf Empty assert hunt (#6056)
* Fixed empty asserts

* black-reformatted stragglers in templates

* More code quality checks

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/convert_marian_to_pytorch.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* removed unused line as per @sshleifer

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-03 10:19:03 +02:00
16c2240164 Add script to convert tf2.x checkpoint to PyTorch (#5791)
* Add script to convert tf2.x checkpoint to pytorch

The script converts the newer TF2.x checkpoints (as published on their official GitHub: https://github.com/tensorflow/models/tree/master/official/nlp/bert) to Pytorch.

* rename file in order to stay consistent with naming convention
2020-08-03 03:53:38 -04:00
82a0e2b67e Fix docstring for BertTokenizerFast (#6185)
- remove duplicate doc-entry for tokenize_chinese_chars
- add doc for strip_accents and wordpieces_prefix
2020-08-02 15:58:26 +08:00
d8dbf3b75d [s2s] clean up + doc (#6184)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-08-01 14:51:07 -04:00
a39dfe4fb1 Fixed typo in Longformer (#6180) 2020-08-01 18:20:48 +08:00
8edfaaa81b bart-large-mnli-yahoo-answers model card (#6133)
* Add bart-large-mnli-yahoo-answers model card

* Add examples

* Add widget example

* Rm bart tag

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 10:56:32 -04:00
d951c14ae4 Model output test (#6155)
* Use return_dict=True in all tests

* Formatting
2020-07-31 09:44:37 -04:00
86caab1e0b Harmonize both Trainers API (#6157)
* Harmonize both Trainers API

* Fix test

* main_prcess -> process_zero
2020-07-31 09:43:23 -04:00
603cd81a01 readme m3hrdadfi/albert-fa-base-v2 (#6153)
* readme m3hrdadfi/albert-fa-base-v2

model_card readme for m3hrdadfi/albert-fa-base-v2

* Update model_cards/m3hrdadfi/albert-fa-base-v2/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-31 06:19:06 -04:00
838dc06ff5 parse arguments from dict (#4869)
* add parse_dict to parse arguments from dict

* add unit test for parse_dict
2020-07-31 04:44:23 -04:00
cf3cf304ca Replace mecab-python3 with fugashi for Japanese tokenization (#6086)
* Replace mecab-python3 with fugashi

This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.

Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.

This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.

fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.

In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.

I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.

* Style fix

* Remove unused variable

Forgot to delete this...

* Adapt doc with install instructions

* Fix typo

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-31 04:41:14 -04:00
f250beb8aa enable easy checkout switch (#5645)
* enable easy checkout switch

allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.

* make isort happy

* examples needs one too
2020-07-31 04:34:46 -04:00
7d50af4b02 Create README.md (#6169) 2020-07-31 04:28:35 -04:00
0034a1d248 Add Pytorch Native AMP support in Trainer (#6151)
* fixed type; add Pytorch Native CUDA AMP support

* reverted commit on modeling_utils

* confirming to HF black formatting rule

* changed bool value of _use_apex

* scaler support for gradient clipping

* fix inplace operation of clip_grad_norm

* removed not while version comparison
2020-07-31 04:23:29 -04:00
7231f7b503 Enable ONNX/ONNXRuntime optimizations through converter script (#6131)
* Add onnxruntime transformers optimization support

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Optimization section in ONNX/ONNXRuntime documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve note reference

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning about different level of optimization between torch and tf export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address @LysandreJik wording suggestion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Always optimize model before quantization for maximum performances.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address comments on the documentation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve TensorFlow optimization message as suggested by @yufenglee

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed --optimize parameter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Warn the user about current quantization limitation when model is larger than 2GB.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trigger CI for last check

* Small change in print for the optimization section.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-31 09:45:13 +02:00
c0b93a1c7a correct the correction (#6163) 2020-07-30 18:00:02 -04:00
a2f6d521c1 typos (#6162)
* 2 small typos

* more typos

* correct path
2020-07-30 17:18:27 -04:00
f3065abdb8 Doc tokenizer (#6110)
* Start doc tokenizers

* Tokenizer documentation

* Start doc tokenizers

* Tokenizer documentation

* Formatting after rebase

* Formatting after merge

* Update docs/source/main_classes/tokenizer.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comment

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Thom's comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-30 14:51:19 -04:00
e642c78908 Addition of a DialoguePipeline (#5516)
* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-30 14:11:39 -04:00
ec0267475c Fix FlauBERT GPU test (#6142)
* Fix GPU test

* Remove legacy constructor
2020-07-30 11:11:48 -04:00
91cb95461e Switch from return_tuple to return_dict (#6138)
* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
2020-07-30 09:17:00 -04:00
562b6369c4 Tf trainer cleanup (#6143)
* Clean up TFTrainer

* Add import

* Fix conflicts
2020-07-30 09:13:16 -04:00
c127d055e6 add another e.g. to avoid confusion (#6055) 2020-07-30 08:53:35 -04:00
d24ea708d7 Actually the extra_id are from 0-99 and not from 1-100 (#5967)
a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9,     3,     2, 25666,   834,    23,    26,
           834,  2915,  3155]])
2020-07-30 06:13:29 -04:00
3212b8850d [s2s] add support for overriding config params (#6149) 2020-07-30 01:09:46 -04:00
54f9fbeff8 Rework TF trainer (#6038)
* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import
2020-07-29 14:32:01 -04:00
3f94170a10 [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice
2020-07-29 14:26:26 -04:00
8a8ae27617 Use google style to document properties (#6130)
* Use google style to document properties

* Update src/transformers/configuration_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-29 12:28:12 -04:00
fc64559c45 Fix TF CTRL model naming (#6134) 2020-07-29 12:20:00 -04:00
641b873c13 XLNet PLM Readme (#6121) 2020-07-29 11:38:15 -04:00
8d157c930b add deepset/xlm-roberta-large-squad2 model card (#6128)
* Add xlm-r QA model card

* Add tags
2020-07-29 17:34:16 +02:00
6c002853a6 Added capability to quantize a model while exporting through ONNX. (#6089)
* Added capability to quantize a model while exporting through ONNX.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

We do not support multiple extensions

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reformat files

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* More quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure test_generate_identified_name compares the same object types

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added documentation everywhere on ONNX exporter

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use pathlib.Path instead of plain-old string

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use f-string everywhere

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use the correct parameters for black formatting

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use Python 3 super() style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use packaging.version to ensure installed onnxruntime version match requirements

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports sorting order.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Missing raise(s)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added quantization documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some spelling.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix bad list header format

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 13:21:29 +02:00
25de74ccfe Use FutureWarning to deprecate (#6111) 2020-07-29 05:20:53 -04:00
640550fc7a ONNX documentation (#5992)
* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-29 11:02:35 +02:00
92f8ce2ed6 Fix deebert tests (#6102) 2020-07-28 18:30:16 -04:00
c49cd927f7 [Fix] position_ids tests again (#6100) 2020-07-28 18:29:35 -04:00
40796c5801 [fix] add bart to LM_MAPPING (#6099) 2020-07-28 18:29:18 -04:00
5abe50381a Fix #6096: MBartTokenizer's mask token (#6098) 2020-07-28 18:27:58 -04:00
b1c8b76907 Fix zero-shot pipeline single seq output shape (#6104) 2020-07-28 14:46:03 -04:00
06834bc332 Logs should not be hidden behind a logger.info (#6097) 2020-07-28 12:44:25 -04:00
dafa296c95 [s2s] Delete useless method, log tokens_per_batch (#6081) 2020-07-28 11:24:23 -04:00
dc4755c6d5 create model-card for lordtt13/emo-mobilebert (#6030) 2020-07-28 10:00:23 -04:00
28931f81b7 Fix #6092 (#6093)
* Fix #6092

* Format
2020-07-28 09:48:39 -04:00
5e97c82940 Create README.md (#6076) 2020-07-28 09:36:00 -04:00
54f49af4ae Add inference widget examples (#5825) 2020-07-28 09:14:00 -04:00
0206efb4cf Make all data collators accept dict (#6065)
* Make all data collators accept dict

* Style
2020-07-28 09:08:20 -04:00
31a5486e42 github issue template suggests who to tag (#5790)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-28 08:41:27 -04:00
f0c70085c2 link to README.md (#6068)
* add a link to README.md

* Update README.md
2020-07-28 20:34:58 +08:00
4f814fd587 [Model Card] camembert-base-squadFR-fquad-piaf (#6087) 2020-07-28 20:33:52 +08:00
3c7fbf35a6 MBART: support summarization tasks where max_src_len > max_tgt_len (#6003)
* MBART: support summarization tasks

* fix test

* Style

* add tokenizer test
2020-07-28 08:18:11 -04:00
842eb45606 New Community NB Add (#5824)
Signed-off-by: lordtt13 <thakurtanmay72@yahoo.com>
2020-07-28 04:25:12 -04:00
018d61fa24 Moving transformers package import statements to relative imports in some files (#5796)
* Moving rom transformers statements to relative imports in some files under src/

* Import order

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-28 04:19:17 -04:00
7214954db4 Should return a tuple for serialization (#6061) 2020-07-28 03:14:31 -04:00
7a68d40138 [s2s] Don't mention packed data in README (#6079) 2020-07-27 20:07:21 -04:00
b7345d22d0 [fix] no warning for position_ids buffer (#6063) 2020-07-27 20:00:44 -04:00
1e00ef681d [s2s] dont document packing because it hurts performance (#6077) 2020-07-27 18:26:00 -04:00
9d0d3a6645 Pin TF while we wait for a fix 2020-07-27 18:03:09 -04:00
769e6ba01f Create README.md (#6032)
Adding model card - readme
2020-07-27 16:25:37 -04:00
fd347e0da7 Add fire to setup.cfg to make isort happy (#6066) 2020-07-27 15:17:33 -04:00
11792d7826 CL util to convert models to fp16 before upload (#5953) 2020-07-27 12:21:25 -04:00
4302ace5bd [pack_dataset] don't sort before packing, only pack train (#5954) 2020-07-27 12:14:23 -04:00
c8bdf7f4ec Add new AutoModel classes in pipeline (#6062)
* use new AutoModel classed

* make style and quality
2020-07-27 11:50:08 -04:00
5779e5434d ✏️ Fix typo (#5734) 2020-07-27 10:55:15 -04:00
d1d15d6f2d [examples (seq2seq)] fix preparing decoder_input_ids for T5 (#5994) 2020-07-27 10:10:43 -04:00
3deffc1d67 Zero shot classification pipeline (#5760)
* add initial zero-shot pipeline

* change default args

* update default template

* add label string splitting

* add str labels support, remove nli from name

* style

* add input validation and working tf defaults

* tests

* quality check

* add docstring to __call__

* add slow tests

* Change truncation to only_first

also lower precision on tests for readibility

* style
2020-07-27 09:42:58 -04:00
1246b20f6d Fix the return documentation rendering for all model outputs (#6022)
* Fix the return documentation rendering for all model outputs

* Formatting
2020-07-27 09:18:59 -04:00
3b64ad5d5c Remove unused file (#6023) 2020-07-27 08:31:24 -04:00
b9b11795cf Update model_summary.rst (#5737)
Add '-' to make the reference of Transformer-XL more accurate and formal.
2020-07-27 05:34:02 -04:00
b21993b362 Allow to set Adam beta1, beta2 in TrainingArgs (#5592)
* Add Adam beta1, beta2 to trainier

* Make style consistent
2020-07-27 05:31:37 -04:00
7969e96f4a draft etalab QA model (#6040) 2020-07-27 05:15:08 -04:00
a9585fd107 Model card for Vamsi/T5_Paraphrase_Paws (#6037)
* Model card for Vamsi/T5_Paraphrase_Paws

* Update model_cards/Vamsi/T5_Paraphrase_Paws/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-27 05:12:45 -04:00
f7f03b22dc Update README.md of my model (#6042) 2020-07-26 23:31:49 +02:00
fb0589a03d don't complain about missing W&B when WANDB_DISABLED=true (#6036)
* don't complain about missing W&B when WANDB_DISABLED=true

* reformat to elif

* typo
2020-07-26 14:29:54 -04:00
daa5dd1202 add a summary report flag for run_examples on CI (#6035)
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:

```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
2020-07-26 14:09:14 -04:00
c69ea5efc4 [CI] Don't test apex (#6021) 2020-07-24 15:34:16 -04:00
a884b7fa38 Update the new model template (#6019) 2020-07-24 14:15:37 -04:00
295466aae6 [model_card] Sample input for rdenadai/BR_BERTo
cc @rdenadai
2020-07-24 14:14:10 -04:00
518361d69a Create model card for RuPERTa-base (#6016)
* Update README.md

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:12:29 -04:00
bd51f0a7ab Create README.md (#5952) 2020-07-24 14:12:14 -04:00
87a779dfa8 Create README.md (#5951) 2020-07-24 14:12:09 -04:00
d115872b38 Create README.md (#6020)
* Create README.md

* Update README.md

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update model_cards/rdenadai/BR_BERTo/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 14:11:14 -04:00
3996041d0a Fix question template (#6014) 2020-07-24 10:04:25 -04:00
778e635fc9 [model_cards] roberta-base-finetuned-yelp-polarity (#6009)
* [model_cards] roberta-base-finetuned-yelp-polarity

* Update model_cards/VictorSanh/roberta-base-finetuned-yelp-polarity/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-24 09:45:21 -04:00
614fef1691 Ensure OpenAI GPT position_ids is correctly initialized and registered at init. (#5773)
* Ensure OpenAI GPT position_ids is correctly initialized and registered as buffer at init.

This will make it compatible with TorchScript export.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix missing slice operator on the tensor data accessor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixed BertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed MobileBertEmbedding position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fixed XLM position_ids buffer created at forward.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-24 15:37:52 +02:00
3b44aa935a Model utils doc (#6005)
* Document TF modeling utils

* Document all model utils
2020-07-24 09:16:28 -04:00
a540405213 Fix commit hash for stable doc 2020-07-24 09:07:40 -04:00
fc0fe2a532 fix: model card readme clutter (#6008)
this removes the clutter line in the readme.md of model card `csarron/roberta-base-squad-v1`. It also fixes the result table.
2020-07-24 04:17:52 -04:00
f5b5c5bd7e Avoid unnecessary warnings when loading pretrained model (#5922)
* Avoid unnecessary warnings when loading pretrained model

* Fix test

* Add other keys to ignore

* keys_to_ignore_at_load -> authorized_missing_keys
2020-07-23 18:13:36 -04:00
29afb5764f Bert german dbmdz uncased sentence stsb (#6000)
* Describe usage of sentence model

* fix typo usage

* add use and description to readme

* fix typo in readme

* readme formatting

* add training procedure to readme

* description name and company

* readme formatting

* dataset training readme

* typo

* readme
2020-07-23 17:56:45 -04:00
2b5ef9706d Model cards: add roberta-base-squad-v1 and bert-base-uncased-squad-v1 (#6006)
* add: bert-base-uncased-squad-v1

* add: roberta-base-squad-v1
2020-07-23 17:53:47 -04:00
9827d666eb MbartTokenizer: do not hardcode vocab size (#5998) 2020-07-23 15:41:14 -04:00
6e16195510 Fix #5974 (#5999) 2020-07-23 13:51:29 -04:00
e168488a74 Cleanup Trainer and expose customization points (#5982)
* Clean up Trainer and expose customization points

* Formatting

* eval_step -> prediction_step
2020-07-23 12:05:41 -04:00
76f52324b1 add fine-tuned mobilebert squad v1 and squad v2 model cards (#5980)
* add mobilebert-uncased-squad-v2

* fix shell cmd, add creator info

* add mobilebert-uncased-squad-v1
2020-07-23 11:57:29 -04:00
7e251ae039 Create README.md (#5989) 2020-07-23 11:41:33 -04:00
33d7506ea1 Update doc of the model page (#5985) 2020-07-22 18:14:57 -04:00
c3206eef44 [test] partial coverage for train_mbart_enro_cc25.sh (#5976) 2020-07-22 14:34:49 -04:00
2c0da7803a minor doc fixes (#5831)
* minor doc fixes

correct superclass name and small grammar fixes

* correct the instance name in the error message

It appears to be `BaseTokenizer` from looking at:

`from tokenizers.implementations import BaseTokenizer as BaseTokenizerFast`

and not `Tokenizer` as it currently says.
2020-07-22 13:22:34 -04:00
feeb956a19 [docs] Add integration test example to copy pasta template (#5961)
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-22 12:48:38 -04:00
01116d3c5b T5 Model Cards (#5759)
* T5 Model Cards

* Fix paths

* Fix tags

* lang-en
2020-07-22 11:38:37 -04:00
896300177b Expose padding_strategy on squad processor to fix QA pipeline performance regression (#5932)
* Attempt to fix the way squad_convert_examples_to_features pad the elements for the QA pipeline.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make the code easier to read and avoid testing multiple test the same thing.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* missing enum value on truncation_strategy.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Rethinking for the easiest fix: expose the padding strategy on squad_convert_examples_to_features.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove unused imports.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-22 16:11:57 +02:00
ae67b2439f [CI] Install examples/requirements.txt (#5956) 2020-07-21 21:07:48 -04:00
e714412fe6 Update doc to new model outputs (#5946)
* Update doc to new model outputs

* Fix outputs in quicktour
2020-07-21 18:13:55 -04:00
ddd40b3211 [CI] self-scheduled runner tests examples/ (#5927) 2020-07-21 17:01:07 -04:00
9dab39feea seq2seq/run_eval.py can take decoder_start_token_id (#5949) 2020-07-21 16:58:45 -04:00
5b193b39b0 [examples/seq2seq]: add --label_smoothing option (#5919) 2020-07-21 16:51:39 -04:00
95d1962b9c [Doc] explaining romanian postprocessing for MBART BLEU hacking (#5943) 2020-07-21 14:12:48 -04:00
604a2355dc Create README.md (#5876) 2020-07-21 13:28:22 -04:00
77c718edef Create README.md (#5873) 2020-07-21 13:28:06 -04:00
325b277db9 Create README.md (#5874) 2020-07-21 13:27:30 -04:00
d15be2216c Create README.md (#5879) 2020-07-21 13:27:13 -04:00
f3e23dd90a Create README.md (#5878) 2020-07-21 13:20:47 -04:00
8b01d15c05 Create README.md (#5877) 2020-07-21 13:20:43 -04:00
05bddf304e Create README.md (#5875) 2020-07-21 13:20:32 -04:00
783a0c7ee9 Create README.md (#5872) 2020-07-21 13:20:21 -04:00
e7844d60c2 Create README.md (#5871) 2020-07-21 13:19:48 -04:00
b1ee69763c Create README.md (#5864) 2020-07-21 13:15:07 -04:00
5f809e4976 Update README.md (#5857)
Add nlp dataset used
2020-07-21 13:14:27 -04:00
4215f59c99 Update README.md (#5856)
Add dataset used as it is now part of nlp package
2020-07-21 13:11:08 -04:00
1d72460d55 Add ComVE model cards (#5884)
* Add ComVE model cards

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 12:54:29 -04:00
ccbf74a685 typos in seq2seq/readme (#5937) 2020-07-21 09:44:59 -04:00
d32279438a Created model card for my extreme summarization model (#5839)
* Created model card for my extreme summarization model

* Update model_cards/yuvraj/xSumm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:57 -04:00
abf5c56e9d Created model card for my summarization model (#5838)
* Created model card for my summarization model

* Update model_cards/yuvraj/summarizer-cnndm/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-21 03:54:14 -04:00
d73baeebc5 Create README.md (#5921)
- Maybe the result of this query answers the question You did some days ago @julien-c ;-)
2020-07-21 03:52:52 -04:00
50acfc8717 Create README.md (#5924) 2020-07-21 03:41:37 -04:00
7249533404 Create README.md (#5920) 2020-07-21 03:31:42 -04:00
4781afd045 Clarify arg class (#5916) 2020-07-20 19:47:06 -04:00
8e0bcb56ec DataParallel fix: multi gpu evaluation (#5926)
The DataParallel training was fixed in https://github.com/huggingface/transformers/pull/5733, this commit also fixes the evaluation. It's more convenient when the user enables both `do_train` and `do_eval`.
2020-07-20 17:54:08 -04:00
a20969170b Add AlbertForPretraining to doc (#5914) 2020-07-20 17:53:21 -04:00
f1a4e06f1f [Fix] seq2seq pack_dataset.py actually packs (#5913)
Huge MT speedup!
2020-07-20 15:18:26 -04:00
32883b310b Improve doc of use_cache (#5912)
* Improve doc of use_cache

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Teven <teven.lescao@gmail.com>

Co-authored-by: Teven <teven.lescao@gmail.com>
2020-07-20 11:50:41 -04:00
9ccb45a263 Update gpt2-README.md 2020-07-20 11:40:33 -04:00
f19751117d Create gpt2-medium-README.md 2020-07-20 10:47:42 -04:00
511523672b Create gpt2-large-README.md 2020-07-20 10:47:27 -04:00
182c611934 Update gpt2-README.md 2020-07-20 10:47:11 -04:00
a9ae27cd0f add link to write with transformers to model card 2020-07-20 10:46:10 -04:00
01c40db4f8 [cleanup] squad processor (#5868) 2020-07-20 10:44:10 -04:00
35cb101eae DataParallel fixes (#5733)
* DataParallel fixes:

1. switched to a more precise check
-        if self.args.n_gpu > 1:
+        if isinstance(model, nn.DataParallel):

2. fix tests - require the same fixup under DataParallel as the training module

* another fix
2020-07-20 09:29:12 -04:00
290b6e18ac Trainer support for iterabledataset (#5834)
* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Don't pass sampler for iterable dataset

* Added check for test and eval dataloaders.

* Formatting

* Cleaner if nesting.

* Added test for trainer and iterable dataset

* Formatting for test

* Fixed import when torch is available only.

* Added require torch decorator to helper class

* Moved dataset class inside unittest

* Removed nested if and changed model in test

* Checking torch availability for IterableDataset
2020-07-20 09:07:37 -04:00
82dd96cae7 [model_cards] Dataset ids are case-sensitive
cc @lhoestq @thomwolf

Also cc'ing model author @nreimers => Model pages now properly link to the dataset pages (and in the future, eval results, etc.)
2020-07-20 12:47:28 +02:00
b01a8844a9 Create README.md (#5813) 2020-07-20 04:06:42 -04:00
223bad242d fix typo in (#5893) 2020-07-20 03:53:03 -04:00
d441f8d29d fix typo in training_args_tf.py (#5894) 2020-07-20 03:48:22 -04:00
09a2f40684 Seq2SeqDataset uses linecache to save memory by @Pradhy729 (#5792)
Co-authored-by: Pradhy729 <49659913+Pradhy729@users.noreply.github.com>
2020-07-18 13:57:33 -04:00
4b506a37e3 Xlnet outputs (#5883)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 17:33:13 +02:00
a55809241f Revert "Xlnet outputs (#5881)" (#5882)
This reverts commit 13be4872123094c37eb5fab939b38967b0ad2cd0.
2020-07-18 17:15:40 +02:00
13be487212 Xlnet outputs (#5881)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-18 16:53:29 +02:00
eae6d8d14f Update tokenizers to 0.8.1.rc to fix Mac OS X issues (#5867) 2020-07-18 08:20:11 -04:00
dad5e12e54 [seq2seq] distillation.py accepts trainer arguments (#5865) 2020-07-18 07:43:57 -04:00
ba2400189b [seq2seq] MAX_LEN env var for MT commands (#5837) 2020-07-17 22:51:31 -04:00
529850ae7b Lightning Updates for v0.8.5 (#5798)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-17 22:43:06 -04:00
615be03f9d Revert "XLNet use_cache refactor (#5770)" (#5854)
This reverts commit 0b2da0e592fcb3a361494509031f91e1b700e4ad.
2020-07-17 20:33:44 +02:00
0b2da0e592 XLNet use_cache refactor (#5770)
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
2020-07-17 20:24:16 +02:00
9750e1300c Create README.md (#5847) 2020-07-17 14:03:53 -04:00
1bca4fbd39 [model_card] Fix metadata 2020-07-17 13:55:37 -04:00
a9d56a675a Added model card for neuraly/bert-base-italian-cased-sentiment (#5845)
* Added model card for neuraly/bert-base-italian-cased-sentiment

* Update model_cards/neuraly/bert-base-italian-cased-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Gianpy15 <g.dipietro@neuraly.ai>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 13:50:49 -04:00
12f14710ce [Model card] Bert2Bert
Add Rouge2 results
2020-07-17 18:22:05 +02:00
9d37c56bab [Reformer] - Cache hidden states and buckets to speed up inference (#5578)
* fix merge rebase

* add intermediate reformer code

* save intermediate caching results

* save intermediate

* save intermediate results

* save intermediate

* upload next step

* fix generate tests

* make tests work

* add named tuple output

* Apply suggestions from code review

* fix use_cache for False case

* fix tensor to gpu

* fix tensor to gpu

* refactor

* refactor and make style
2020-07-17 16:17:42 +02:00
0b6c255a95 [Model card] Bert2Bert (#5841)
* Create README.md

* Update README.md

* Update README.md

* Update README.md
2020-07-17 11:41:56 +02:00
3d9556a72b [cleanups] make Marian save as Marian (#5830) 2020-07-17 02:54:25 -04:00
e238e3d55a [seq2seq] Don't copy self.source in sortishsampler (#5818) 2020-07-17 01:53:25 -04:00
2e4624b415 language tag addition on albert-mongolian (#5828)
* language tag addition on albert-mongolian

* Update model_cards/bayartsogt/albert-mongolian/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-17 01:40:38 -04:00
d088d744ad Create README.md (#5821) 2020-07-16 15:18:31 -04:00
233072fc1e dv-wave (#5823) 2020-07-16 15:13:51 -04:00
283500ff9f [seq2seq] pack_dataset.py rewrites dataset in max_tokens format (#5819) 2020-07-16 14:06:49 -04:00
c45d7a707d Update README.md (#5812)
Fix missig "-" in meta data
2020-07-16 10:25:50 -04:00
057411c56a fix longformer slow down (#5811) 2020-07-16 16:19:37 +02:00
89a78be51f fix benchmark for longformer (#5808) 2020-07-16 15:15:10 +02:00
aefc0c0429 fix benchmark non standard model (#5801) 2020-07-16 12:13:10 +02:00
8ce610bc96 Update README.md (#5789) 2020-07-16 05:26:17 -04:00
6b6d035d8f [model_card] illuin/lepetit 2020-07-16 03:50:47 -04:00
d1f74b9aff ADD ERNIE model (#5763)
* ERNIE model card

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update README.md

* Update Readme.md

* Update README.md

* Rename Readme.md to README.md

* Update Readme.md

* Update Readme.md

* Rename Readme.md to README.md

* Update and rename Readme.md to README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-16 11:03:05 +08:00
3b924fabee Create distilbert squad tags 2020-07-15 17:59:06 -04:00
067814102c fix readme 2020-07-15 17:50:46 -04:00
d179fd69ca test readme change 2020-07-15 17:48:22 -04:00
63761614eb Update README.md (#5776)
Add cherry picked example for the widget

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:19:21 -04:00
221e23c6c1 Create README.md (#5781)
* Create README.md

* Update model_cards/mrm8488/RoBasquERTa/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:25 -04:00
d4cda29af1 Create README.md (#5782)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:17:19 -04:00
62ec28ce4f [model_cards] Fix pierreguillou/gpt2-small-portuguese 2020-07-15 22:14:52 +02:00
a946724bbf metadata (#5758)
* metadata

* Update model_cards/pierreguillou/gpt2-small-portuguese/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-15 16:13:28 -04:00
015dc51fe3 [model_card] bert-portuguese: add language meta
cc @rodrigonogueira4 @abiocapsouza @robertoalotufo

Also cc @piegu

Obrigado :)
2020-07-15 21:25:52 +02:00
1a647abf0b [fix] check code quality (#5772) 2020-07-15 14:59:38 -04:00
b23d3a5ad4 [model_cards] Switch all languages codes to ISO-639-{1,2,3} 2020-07-15 18:59:20 +02:00
d533c7e9b9 [fix] T5 ONNX test: model.to(torch_device) (#5769)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-07-15 10:11:22 -04:00
d0486c8bc2 [cleanup] T5 test, warnings (#5761) 2020-07-15 08:23:22 -04:00
ec0a945cf9 [AutoModels] Fix config params handling of all PT and TF AutoModels (#5665)
* fix auto model causal lm

* leverage given functionality

* apply unused kwargs to all auto models
2020-07-15 09:51:14 +02:00
8ab565a4be [model_card] Fix syntax 2020-07-14 22:27:07 +02:00
92dc959224 Update README.md (#5752) 2020-07-14 15:48:59 -04:00
baf93b02c4 Update README.md (#5696) 2020-07-14 12:51:57 -04:00
5d178954c9 tiny ppl doc typo fix (#5751) 2020-07-14 10:39:44 -06:00
ac921f0385 RuPERTa model card (#5743)
* Customize inference widget input

* Update model_cards/mrm8488/RuPERTa-base/README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-14 22:58:45 +08:00
21c1fe5290 RuDR-BERT model card (#5698) 2020-07-14 22:51:53 +08:00
2db1cc807b Norod78/hewiki-articles-distilGPT2py-il model card (#5735)
Model card for hewiki-articles-distilGPT2py-il
A tiny GPT2 model for generating Hebrew text
2020-07-14 22:50:44 +08:00
dae244ad89 GPorTuguese-2 model card (#5744) 2020-07-14 22:48:52 +08:00
b2505f7db7 Cleanup bart caching logic (#5640) 2020-07-14 06:13:05 -04:00
838950ee44 [fix] mbart_en_ro_generate test now identical to fairseq (#5731) 2020-07-14 06:12:24 -04:00
4d5a8d6557 docs(wandb): explain how to use W&B integration (#5607)
* docs(wandb): explain how to use W&B integration

fix #5262

* Also mention TensorBoard

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-14 05:12:33 -04:00
cd30f98fd2 doc: fix apparent copy-paste error in docstring (#5626) 2020-07-14 09:47:41 +02:00
f867000f56 [Reformer classification head] Implement the reformer model classification head for text classification (#5198)
* Reformer model head classification implementation for text classification

* Reformat the reformer model classification code

* PR review comments, and test case implementation for reformer for classification head changes

* CI/CD reformer for classification head test import error fix

* CI/CD test case implementation  added ReformerForSequenceClassification to all_model_classes

* Code formatting- fixed

* Normal test cases added for reformer classification head

* Fix test cases implementation for the reformer classification head

* removed token_type_id parameter from the reformer classification head

* fixed the test case for reformer classification head

* merge conflict with master fixed

* merge conflict, changed reformer classification to accept the choice_label parameter added in latest code

* refactored the the reformer classification head test code

* reformer classification head, common transform test cases fixed

* final set of the review comment, rearranging the reformer classes and docstring add to classification forward method

* fixed the compilation error and text case fix for reformer classification head

* Apply suggestions from code review

Remove unnecessary dup

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-07-14 09:16:22 +02:00
f0bda06f43 Update tokenization_t5.py (#5717)
Minor doc fix.
2020-07-14 00:02:03 -04:00
c3c61ea017 [Fix] github actions CI by reverting #5138 (#5686) 2020-07-13 17:12:18 -04:00
45addfe96d FlaubertForTokenClassification (#5644)
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification

* fix mapping order

* add the doc

* add common tests
2020-07-13 14:59:53 -04:00
7096e47513 [Longformer] fix longformer global attention output (#5659)
* fix longformer global attention output

* fix multi gpu problem

* replace -10000 with 0

* better comment

* make attention output equal local and global

* Update src/transformers/modeling_longformer.py
2020-07-13 17:23:22 +02:00
ce374ba877 Fix Trainer in DataParallel setting (#5685)
* Fix Trainer in DataParallel setting

* Fix typo

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-07-13 08:37:38 -04:00
0a19a49dfe doc improvements (#5688) 2020-07-13 18:10:17 +08:00
443b0cad96 rename the function to match the rest of the test convention (#5692) 2020-07-13 18:09:49 +08:00
74843695eb Added first description of the model (#5672)
Added general description, information about the tags and also some example usage code.
2020-07-13 02:53:48 -04:00
0befb51327 Pipeline model type check (#5679)
* Add model type check for pipelines

* Add model type check for pipelines

* rename func

* Fix the init parameters

* Fix format

* rollback unnecessary refactor
2020-07-12 12:34:21 +08:00
dc31a72f50 Add Microsoft's CodeBERT (#5683)
* Add Microsoft's CodeBERT

* link style

* single modal

* unused import
2020-07-11 21:37:30 +08:00
7fad617dc1 Document model outputs (#5673)
* Document model outputs

* Update docs/source/main_classes/output.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-10 17:31:02 -04:00
df983b7483 Deprecate old past arguments (#5671) 2020-07-10 17:25:52 -04:00
cdf4cd7068 [squad] add version tag to squad cache (#5669) 2020-07-10 16:34:21 -04:00
223084e42b Add Reformer to notebooks 2020-07-10 18:34:25 +02:00
201d23f285 Update The Big Table of Tasks
Co-Authored-By: Suraj Patil <surajp815@gmail.com>
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-10 18:07:29 +02:00
82f7bbbd93 Update README.md (#5617)
* Update README.md

* Update README.md
2020-07-10 11:43:27 -04:00
bf497376ee Create README.md (#5572) 2020-07-10 11:42:49 -04:00
3653d01f2a Create README.md for electra-base-squad2 (#5574) 2020-07-10 11:39:44 -04:00
aa69c81f29 Add freshly trained base version (#5621) 2020-07-10 11:39:04 -04:00
227e0a406d Fixed use of memories in XLNet (caching for language generation + warning when loading improper memoryless model) (#5632)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Pytorch gpu => cpu proper device"

This reverts commit 93489b36

* made black happy

* TF generation with memories

* dim => axis

* added padding_text to TF XL models

* Added comment, added TF
2020-07-10 17:38:36 +02:00
3b7b646563 Create README.md (#5638) 2020-07-10 11:38:23 -04:00
0039b965db Create model card (#5655)
Create model card for T5-small fine-tuned on SQUAD v2
2020-07-10 11:38:11 -04:00
46982d612f Create README.md - Model card (#5657)
Model card for sentence-transformers/bert-base-nli-cls-token
2020-07-10 11:38:03 -04:00
c483803d1b Create README.md - Model card (#5658)
Model card for sentence-transformers/bert-base-nli-max-tokens
2020-07-10 11:37:56 -04:00
edfd82f5ff Change model outputs types to self-document outputs (#5438)
* [WIP] Proposal for model outputs

* All Bert models

* Make CI green maybe?

* Fix ONNX test

* Isolate ModelOutput from pt and tf

* Formatting

* Add Electra models

* Auto-generate docstrings from outputs

* Add TF outputs

* Add some BERT models

* Revert TF side

* Remove last traces of TF changes

* Fail with a clear error message

* Add Albert and work through Bart

* Add CTRL and DistilBert

* Formatting

* Progress on Bart

* Renames and finish Bart

* Formatting

* Fix last test

* Add DPR

* Finish Electra and add FlauBERT

* Add GPT2

* Add Longformer

* Add MMBT

* Add MobileBert

* Add GPT

* Formatting

* Add Reformer

* Add Roberta

* Add T5

* Add Transformer XL

* Fix test

* Add XLM + fix XLMForTokenClassification

* Style + XLMRoberta

* Add XLNet

* Formatting

* Add doc of return_tuple arg
2020-07-10 11:36:53 -04:00
fa265230a2 Create Model card for RoBERTa-hindi-guj-san (#5661) 2020-07-10 11:34:23 -04:00
b2747af543 Improvements to PretrainedConfig documentation (#5642)
* Update PretrainedConfig doc

* Formatting

* Small fixes

* Forgotten args and more cleanup
2020-07-10 10:31:47 -04:00
bfacb2e34f [model_card] BART for ELI5
cc @yjernite
2020-07-10 08:10:24 -04:00
2e6bb0e9c3 Create README.md (#5652) 2020-07-10 05:41:10 -04:00
552e4591f5 [model_card] Add meta + fix link to image
(hotlinking to image works on GitHub but not on external sites)

cc @bashartalafha
2020-07-10 05:07:33 -04:00
02a0b43014 Fixed TextGenerationPipeline on torch + GPU (#5629)
* Pytorch gpu => cpu proper device

* Memoryless XLNet warning + fixed memories during generation

* Revert "Memoryless XLNet warning + fixed memories during generation"

This reverts commit 3d3251ff

* Took the operations on the generated_sequence out of the ensure_device scope
2020-07-09 16:29:32 -04:00
760f726e51 Add forum link in the docs (#5637) 2020-07-09 15:13:22 -04:00
bfeaae2235 fix 404 (#5616) 2020-07-09 15:12:29 -04:00
b25f7802de Should check that torch TPU is available (#5636) 2020-07-09 13:54:32 -04:00
3cc23eee06 More explicit error when failing to tensorize overflowing tokens (#5633) 2020-07-09 13:35:21 -04:00
b9d8af07e6 Update stable doc 2020-07-09 11:06:23 -04:00
1158e56551 Correct extension (#5631) 2020-07-09 11:03:07 -04:00
5c82bf6831 Update stable doc 2020-07-09 10:16:13 -04:00
0533cf4706 Test XLA examples (#5583)
* Test XLA examples

* Style

* Using `require_torch_tpu`

* Style

* No need for pytest
2020-07-09 09:19:19 -04:00
3bd55199cd QA pipeline BART compatible (#5496)
* Ensure padding and question cannot have higher probs than context.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Add bart the the list of tokenizers adding two <sep> tokens for squad_convert_example_to_feature

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @patrickvonplaten comments about masking non-context element when generating the answer.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing @sshleifer comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Make sure we mask CLS after handling impossible answers

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask in the correct vectors ...

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-09 15:11:40 +02:00
fa5423b169 doc fixes (#5613) 2020-07-08 19:52:44 -04:00
7d0ef00420 Add newly trained calbert-tiny-uncased (#5599)
* Create README.md

Add newly trained `calbert-tiny-uncased` (complete rewrite with SentencePiece)

* Add Exbert link

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-08 17:54:51 -04:00
0cc4eae0e6 Fix Inconsistent NER Grouping (Pipeline) (#4987)
* Add B I handling to grouping

* Add fix to include separate entity as last token

* move last_idx definition outside loop

* Use first entity in entity group as reference for entity type

* Add test cases

* Take out extra class accidentally added

* Return tf ner grouped test to original

* Take out redundant last entity

* Get last_idx safely

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>

* Fix first entity comment

* Create separate functions for group_sub_entities and group_entities (splitting call method to testable functions)

* Take out unnecessary last_idx

* Remove additional forward pass test

* Move token classification basic tests to separate class

* Move token classification basic tests back to monocolumninputtestcase

* Move base ner tests to nerpipelinetests

* Take out unused kwargs

* Add back mandatory_keys argument

* Add unitary tests for group_entities in _test_ner_pipeline

* Fix last entity handling

* Fix grouping fucntion used

* Add typing to group_sub_entities and group_entities

Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
2020-07-08 16:18:17 -04:00
82ce8488bb create model cards for qg models (#5610) 2020-07-08 16:08:56 -04:00
d6b6ab11f0 Create README.md (#5601) 2020-07-08 16:07:48 -04:00
40d98ebf50 Update benchmark notebook (#5603)
* Créé avec Colaboratory

* delete old file
2020-07-08 16:03:59 +02:00
281e394889 Update question template (#5585) 2020-07-08 08:46:35 -04:00
f82a2a5e8e [Benchmark] Add benchmarks for TF Training (#5594)
* tf_train

* adapt timing for tpu

* fix timing

* fix timing

* fix timing

* fix timing

* update notebook

* add tests
2020-07-08 12:11:09 +02:00
cfbb982974 Add DeeBERT (entropy-based early exiting for *BERT) (#5477)
* Add deebert code

* Add readme of deebert

* Add test for deebert

Update test for Deebert

* Update DeeBert (README, class names, function refactoring); remove requirements.txt

* Format update

* Update test

* Update readme and model init methods
2020-07-08 08:17:59 +08:00
b4b33fdf25 Guide to fixed-length model perplexity evaluation (#5449)
* add first draft ppl guide

* upload imgs

* expand on strides

* ref typo

* rm superfluous past var

* add tokenization disclaimer
2020-07-07 16:04:15 -06:00
fde217c679 readme for benchmark (#5363) 2020-07-07 23:21:23 +02:00
d6eab53058 mbart.prepare_translation_batch: pass through kwargs (#5581) 2020-07-07 13:46:05 -04:00
353b8f1e7a Add mbart-large-cc25, support translation finetuning (#5129)
improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg
2020-07-07 13:23:01 -04:00
141492448b Create xlm-roberta-large-finetuned-conll03-german-README.md
cc @BramVanroy
2020-07-07 13:15:10 -04:00
4dc65591b5 [Almost all TF models] TF clean up: add missing CLM / MLM loss; fix T5 naming and keras compile (#5395)
* add first version of clm tf

* make style

* add more tests for bert

* update tf clm loss

* fix tests

* correct tf ner script

* add mlm loss

* delete bogus file

* clean tf auto model + add tests

* finish adding clm loss everywhere

* fix training in distilbert

* fix flake8

* save intermediate

* fix tf t5 naming

* remove prints

* finish up

* up

* fix tf gpt2

* fix new test utils import

* fix flake8

* keep backward compatibility

* Update src/transformers/modeling_tf_albert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_electra.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_roberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_mobilebert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_distilbert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply sylvains suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 18:15:53 +02:00
33e43edddc [docs] fix model_doc links in model summary (#5566)
* fix model_doc links

* update model links
2020-07-07 11:06:12 -04:00
4fedc1256c Fix tests imports dpr (#5576)
* fix test imports

* fix max_length

* style

* fix tests
2020-07-07 16:35:12 +02:00
d4886173b2 [Bart] enable test_torchscript, update test_tie_weights (#5457)
* Passing all but one torchscript test

* Style

* move comment

* remove unneeded assert
2020-07-07 10:06:48 -04:00
e49393c361 [examples] Add trainer support for question-answering (#4829)
* add SquadDataset

* add DataCollatorForQuestionAnswering

* update __init__

* add run_squad with  trainer

* add DataCollatorForQuestionAnswering in __init__

* pass data_collator to trainer

* doc tweak

* Update run_squad_trainer.py

* Update __init__.py

* Update __init__.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-07-07 08:57:08 -04:00
fbd8792195 Add DPR model (#5279)
* beginning of dpr modeling

* wip

* implement forward

* remove biencoder + better init weights

* export dpr model to embed model for nlp lib

* add new api

* remove old code

* make style

* fix dumb typo

* don't load bert weights

* docs

* docs

* style

* move the `k` parameter

* fix init_weights

* add pretrained configs

* minor

* update config names

* style

* better config

* style

* clean code based on PR comments

* change Dpr to DPR

* fix config

* switch encoder config to a dict

* style

* inheritance -> composition

* add messages in assert startements

* add dpr reader tokenizer

* one tokenizer per model

* fix base_model_prefix

* fix imports

* typo

* add convert script

* docs

* change tokenizers conf names

* style

* change tokenizers conf names

* minor

* minor

* fix wrong names

* minor

* remove unused convert functions

* rename convert script

* use return_tensors in tokenizers

* remove n_questions dim

* move generate logic to tokenizer

* style

* add docs

* docs

* quality

* docs

* add tests

* style

* add tokenization tests

* DPR full tests

* Stay true to the attention mask building

* update docs

* missing param in bert input docs

* docs

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-07-07 08:56:12 -04:00
d2a9399115 Update model card (#5491) 2020-07-07 18:43:49 +08:00
2e653d89d7 Update model card (#5492) 2020-07-07 18:43:34 +08:00
beaf60e589 bert-turkish-text-classification model card (#5493) 2020-07-07 18:43:09 +08:00
e6eba8419c electra-small-finetuned-squadv1 model card (#5430)
* Create model card

Create model card for electra-small-discriminator finetuned on SQUAD v1.1

* Set right model path in code example
2020-07-07 18:41:42 +08:00
43b7ad5df5 ukr-roberta-base model card (#5514) 2020-07-07 18:40:23 +08:00
87aa857d7e roberta-base-1B-1-finetuned-squadv1 model card (#5515) 2020-07-07 18:39:09 +08:00
c7d96b60e4 zuBERTa model card (#5536)
* Create README

* Update README.md

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-07-07 18:38:15 +08:00
b95dfcf110 roberta-base-1B-1-finetuned-squadv2 model card (#5523) 2020-07-07 18:33:42 +08:00
6912265711 Make T5 compatible with ONNX (#5518)
* Default decoder inputs to encoder ones for T5 if neither are specified.

* Fixing typo, now all tests are passing.

* Changing einsum to operations supported by onnx

* Adding a test to ensure T5 can be exported to onnx op>9

* Modified test for onnx export to make it faster

* Styling changes.

* Styling changes.

* Changing notation for matrix multiplication

Co-authored-by: Abel Riboulot <tkai@protomail.com>
2020-07-07 11:32:29 +02:00
989ae326b5 [Reformer] Adapt Reformer MaskedLM Attn mask (#5560)
* fix attention mask

* fix slow test

* refactor attn masks

* fix fp16 generate test
2020-07-07 10:48:06 +02:00
3dcb748e31 Added data collator for permutation (XLNet) language modeling and related calls (#5522)
* Added data collator for XLNet language modeling and related calls

Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.

Resolves: #4739, #2008 (partially)

* Changed name to `DataCollatorForPermutationLanguageModeling`

Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.

* Added detailed comments, changed variable names

Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.

* Added tests for new data collator

Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.

* Fixed styling issues
2020-07-07 10:17:37 +02:00
1d2332861f Post v3.0.2 release commit 2020-07-06 18:56:47 -04:00
b0892fa0e8 Release: v3.0.2 2020-07-06 18:49:44 -04:00
f1e2e423ab Fix fast tokenizers too (#5562) 2020-07-06 18:45:01 -04:00
5787e4c159 Various tokenizers fixes (#5558)
* BertTokenizerFast - Do not specify strip_accents by default

* Bump tokenizers to new version

* Add test for AddedToken serialization
2020-07-06 18:27:53 -04:00
21f28c34b7 Fix #5507 (#5559)
* Fix #5507

* Fix formatting
2020-07-06 17:26:48 -04:00
9d9b872b66 The add_space_before_punct_symbol is only for TransfoXL (#5549) 2020-07-06 12:17:05 -04:00
d6b0b9d451 GPT2 tokenizer should not output token type IDs (#5546)
* GPT2 tokenizer should not output token type IDs

* Same for OpenAIGPT
2020-07-06 11:33:57 -04:00
7833b21a5a Fix #5544 (#5551) 2020-07-06 11:22:24 -04:00
c473484087 Fix the tokenization warning noted in #5505 (#5550)
* fix warning

* style and quality
2020-07-06 11:15:25 -04:00
1bbc28bee7 Imports organization 2020-07-06 10:27:10 -04:00
1bc13697b1 Update convert_pytorch_checkpoint_to_tf2.py (#5531)
fixed ImportError: cannot import name 'hf_bucket_url'
2020-07-06 09:55:10 -04:00
b2309cc6bf Typo fix in training doc (#5495) 2020-07-06 09:15:22 -04:00
7ecff0ccbb Fix typo in training (#5510) 2020-07-06 09:14:57 -04:00
58cca47c16 [cleanup] TF T5 tests only init t5-base once. (#5410) 2020-07-03 14:27:49 -04:00
991172922f better error message (#5497) 2020-07-03 19:25:25 +02:00
b58a15a31e unpining specific git versions in setup.py 2020-07-03 17:38:39 +02:00
fedabcd154 Release: 3.0.1 2020-07-03 17:02:44 +02:00
17ade127b9 Exposing prepare_for_model for both slow & fast tokenizers (#5479)
* Exposing prepare_for_model for both slow & fast tokenizers

* Update method signature

* The traditional style commit

* Hide the warnings behind the verbose flag

* update default truncation strategy and prepare_for_model

* fix tests and prepare_for_models methods

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-07-03 16:51:21 +02:00
814ed7ee76 Create model card (#5396)
Create model card for electicidad-small (Spanish Electra) fine-tuned on SQUAD-esv1
2020-07-03 08:29:09 -04:00
49281ac939 grammar corrections and train data update (#5448)
- fixed grammar and spelling
- added an intro
- updated Training data references
2020-07-03 08:25:57 -04:00
97355339f6 Update upstream (#5456) 2020-07-03 08:16:27 -04:00
55b932a818 Create model card (#5464)
Create model card for electra-small-discriminator fine-tuned on SQUAD v2.0
2020-07-03 06:19:49 -04:00
21cd8c4086 QA Pipelines fixes (#5429)
* Make QA pipeline supports models with more than 2 outputs such as BART assuming start/end are the two first outputs.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length.

This result in every sample going through QA pipelines to be of size 384 whatever the actual input size is making the overall pipeline very slow.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Mask padding & question before applying softmax. Softmax has been refactored to operate in log space for speed and stability.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Use PaddingStrategy.LONGEST instead of DO_NOT_PAD

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Revert "When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length."

This reverts commit 1b00a9a2

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI after unattended failure

* Trigger CI
2020-07-03 10:29:20 +02:00
8438bab38e Fix roberta model ordering for TFAutoModel (#5414) 2020-07-02 19:23:55 -04:00
6b735a7253 Tokenizer summary (#5467)
* Work on tokenizer summary

* Finish tutorial

* Link to it

* Apply suggestions from code review

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add vocab definition

Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-02 17:07:42 -04:00
ef0e9d806c Update: ElectraDiscriminatorPredictions forward. (#5471)
`ElectraDiscriminatorPredictions.forward` should not need `attention_mask`.
2020-07-02 13:57:33 -04:00
13a8588f2d Create model card (#5432)
Create model card for electra-base-discriminator fine-tuned on SQUAD v1.1
2020-07-02 10:16:30 -04:00
a0a6387a0d [model_cards] roberta-large-mnli: fix sep_token 2020-07-02 10:04:02 -04:00
215db688da Create roberta-large-mnli-README.md 2020-07-02 09:43:54 -04:00
69d313e808 Bans SentencePiece 0.1.92 (#5418) 2020-07-02 09:23:00 -04:00
84e56669af Fix typo in glossary (#5466) 2020-07-02 09:19:33 -04:00
c6a510c6fa Fixing missing arguments for TransfoXL tokenizer when using TextGenerationPipeline (#5465)
* overriding _parse_and_tokenize in `TextGenerationPipeine` to allow for TransfoXl tokenizer arguments
2020-07-02 13:53:33 +02:00
6726416e4a Changed expected_output_ids in TransfoXL generation test (#5462)
* Changed expected_output_ids in TransfoXL generation test to match #4826 generation PR.

* making black happy

* making isort happy
2020-07-02 11:56:44 +02:00
812def00c9 fix use of mems in Transformer-XL (#4826)
Fixed duplicated memory use in Transformer-XL generation leading to bad predictions and performance.
2020-07-02 11:19:07 +02:00
306f1a2695 Add Reformer MLM notebook (#5450)
* Add Reformer MLM notebook

* Update notebooks/README.md
2020-07-02 00:20:49 +02:00
d16e36c7e5 [Reformer] Add Masked LM Reformer (#5426)
* fix conflicts

* fix

* happy rebasing
2020-07-01 22:43:18 +02:00
f4323dbf8c Don't discard entity_group when token is the latest in the sequence. (#5439)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-07-01 20:30:42 +02:00
35befd9ce3 Fix tensor label type inference in default collator (#5250)
* allow tensor label inputs to default collator

* replace try/except with type check
2020-07-01 10:40:14 -06:00
fe81f7d12c finish reformer qa head (#5433) 2020-07-01 12:27:14 -04:00
d697b6ca75 [Longformer] Major Refactor (#5219)
* refactor naming

* add small slow test

* refactor

* refactor naming

* rename selected to extra

* big global attention refactor

* make style

* refactor naming

* save intermed

* refactor functions

* finish function refactor

* fix tests

* fix longformer

* fix longformer

* fix longformer

* fix all tests but one

* finish longformer

* address sams and izs comments

* fix transpose
2020-07-01 17:43:32 +02:00
e0d58ddb65 [fix] Marian tests import (#5442) 2020-07-01 11:42:22 -04:00
608d5a7c44 Raises PipelineException on FillMaskPipeline when there are != 1 mask_token in the input (#5389)
* Added PipelineException

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fill-mask pipeline raises exception when more than one mask_token detected.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Put everything in a function.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Added tests on pipeline fill-mask when input has != 1 mask_token

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Fix numel() computation for TF

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Addressing PR comments.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove function typing to avoid import on specific framework.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Retry typing with @julien-c tip.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Quality².

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Simplify fill-mask mask_token checking.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Trigger CI
2020-07-01 17:27:47 +02:00
6c55e9fc32 Fix dropdown bug in searches (#5440)
* Trigger CI

* Fix dropdown bug in searches
2020-07-01 11:02:59 -04:00
734a28a767 Clean up diffs in Trainer/TFTrainer (#5417)
* Cleanup and unify Trainer/TFTrainer

* Forgot to adapt TFTrainingArgs

* In tf scripts n_gpu -> n_replicas

* Update src/transformers/training_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Formatting

* Fix typo

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-07-01 11:00:20 -04:00
43cb03a93d MarianTokenizer.prepare_translation_batch uses new tokenizer API (#5182) 2020-07-01 10:32:50 -04:00
13deb95a40 Move tests/utils.py -> transformers/testing_utils.py (#5350) 2020-07-01 10:31:17 -04:00
9c219305f5 Trigger CI 2020-07-01 10:22:50 -04:00
64e3d966b1 Add support for past states (#5399)
* Add support for past states

* Style and forgotten self

* You mean, documenting is not enough? I have to actually add it too?

* Add memory support during evaluation

* Fix tests in eval and add TF support

* No need to change this line anymore
2020-07-01 08:11:55 -04:00
4ade7491f4 Fix examples titles and optimization doc page (#5408) 2020-07-01 08:11:25 -04:00
d60d231ea4 Create README.md (#5422)
* Create README.md

* Update model_cards/MoseliMotsoehli/TswanaBert/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-07-01 05:01:51 -04:00
Jay
298bdab18a Create model card for schmidek/electra-small-cased (#5400) 2020-07-01 04:01:56 -04:00
fcf0652460 Fix TensorFlow dataset generator (#4881)
* fix TensorFlow generator

* Better features handling

* Apply style

* Apply style

* Fix squad as well

* Apply style

* Better factorization of TF Tensors creation
2020-06-30 19:49:11 -04:00
501040fd30 In the run_ner.py example, give the optional label arg a default value (#5326)
Otherwise, if label is not specified, the following error occurs:

	Traceback (most recent call last):
	  File "run_ner.py", line 303, in <module>
	    main()
	  File "run_ner.py", line 101, in main
	    model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
	  File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
	    obj = dtype(**inputs)
	TypeError: __init__() missing 1 required positional argument: 'labels'
2020-06-30 19:45:35 -04:00
b45e65efa0 Avoid deprecation warning for F.tanh (#5413) 2020-06-30 16:41:43 -04:00
23231c0f78 [GH Runner] fix yaml indent (#5412) 2020-06-30 16:17:12 -04:00
ac61114592 [CI] gh runner doesn't use -v, cats new result (#5409) 2020-06-30 16:12:14 -04:00
27a7fe7a8d examples/seq2seq: never override $WANDB_PROJECT (#5407) 2020-06-30 15:29:13 -04:00
32d2031458 [fix] slow fill_mask test failure (#5406) 2020-06-30 15:28:15 -04:00
80aa4b8aa6 [CI] GH-runner stores artifacts like CircleCI (#5318) 2020-06-30 15:01:53 -04:00
87716a6d07 Documentation for the Trainer API (#5383)
* Documentation for the Trainer API

* Address review comments

* Address comments
2020-06-30 11:43:43 -04:00
c4d4e8bdbd Move GenerationMixin to separate file (#5254)
* separate_generation_code

* isort

* renamed

* rename_files

* move_shapelit
2020-06-30 10:42:08 -04:00
90d13954c4 Repin versions 2020-06-30 09:16:36 -04:00
0607b88945 How to share model cards with the CLI (#5374)
* How to share model cards

* Switch the two options

* Fix bad copy/cut

* Julien's suggestion
2020-06-30 08:59:32 -04:00
331d8d2936 Upload DistilBART artwork (#5394) 2020-06-30 18:11:11 +08:00
09e841490c Model Card Fixing (#5369)
- Fix missing ```-``` in language meta
- T5 pic uploaded to a more permanent place
2020-06-30 18:02:24 +08:00
4c5bed192a Model Card Fixing (#5373)
- T5 pic uploaded to a more permanent place
2020-06-30 18:01:45 +08:00
02509d4b06 Model Card Fixing (#5371)
- Model pic uploaded to a more permanent place
2020-06-30 18:01:11 +08:00
79f0118c72 Model Card Fixing (#5370)
- Fix missing ```-``` in language meta
- T5 pic uploaded to a more permanent place
2020-06-30 18:00:29 +08:00
9a473f1e43 Update Bertabs example to work again (#5355)
* Fix the bug 'Attempted relative import with no known parent package' when using the bertabs example. Also change the used model from bertabs-finetuned-cnndm, since it seems not be accessible anymore

* Update run_summarization.py

Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
2020-06-30 14:05:01 +08:00
7f60e93ac5 Mention openAI model card and merge content (#5378)
* Mention openAI model card and merge content

* Fix sentence
2020-06-29 18:27:36 -04:00
482a5993c2 Fix model card folder name so that it is consistent with model hub (#5368)
* Merge upstream

* Merge upstream

* Add generate.py link

* Merge upstream

* Merge upstream

* Fix folder name
2020-06-29 12:54:30 -04:00
97f24303e8 Add link to file and fix typos in model card (#5367)
* Merge upstream

* Merge upstream

* Add generate.py link
2020-06-29 11:34:52 -04:00
b9ee87f5c7 Doc for v3.0.0 (#5366)
* Doc for v3.0.0

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/_static/js/custom.js

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-29 11:08:54 -04:00
b62ca59527 Release: v3.0.0 2020-06-29 10:40:13 -04:00
a316a6aaa8 [seq2seq docs] Move evaluation down, fix typo (#5365) 2020-06-29 10:36:04 -04:00
4bcc35cd69 [Docs] Benchmark docs (#5360)
* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-29 16:08:57 +02:00
482c9178d3 Pin mecab for now (#5362) 2020-06-29 09:51:13 -04:00
2513fe0d02 added subtitle for recent contributors in readme (#5130) 2020-06-29 09:05:08 -04:00
30245c0c60 Fix table format fot test tesults (#5357) 2020-06-29 09:02:33 -04:00
c34010551a Create model card (#5356) 2020-06-29 09:01:55 -04:00
01aa0b8527 Create README.md (#5353) 2020-06-29 08:58:30 -04:00
96907367f1 arxiv-ai-gpt2 model card (#5337)
* Add model card and generation script for model arxiv_ai_gpt2

* Update arxiv-ai-gpt2 model card

Remove unnecessary lines

* Delete code in model cards
2020-06-29 08:53:20 -04:00
3cdf8b7ec2 Create model card for asafaya/bert-mini-arabic (#5352)
* Create README.md

* Update model_cards/asafaya/bert-mini-arabic/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 08:41:41 -04:00
9db1f41604 Create README.md (#5351) 2020-06-29 08:36:00 -04:00
c950fef545 [docs] Small tweaks to #5323 2020-06-29 14:24:33 +02:00
4544f906e2 model cards for roberta and bert-multilingual (#5324)
* More model cards (cc @myleott)

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-29 05:06:05 -04:00
92671532e7 More model cards 2020-06-29 10:58:54 +02:00
9209d36f93 Added a model card README.md for my pretrained model. (#5325)
* Create README.md

* Removed unnecessary link from README.md

* Update README.md
2020-06-29 16:29:14 +08:00
7cb52f53ef Fix LR decay in TF Trainer (#5269)
* Recover old PR

* Apply style

* Trigger CI
2020-06-29 14:38:32 +08:00
321c05abab Model cards for finance-koelectra models (#5313)
* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card

* Add finance-koelectra readme card
2020-06-29 13:47:44 +08:00
28a690a80e [mBART] skip broken forward pass test, stronger integration test (#5327) 2020-06-28 15:08:28 -04:00
45e26125de save_pretrained: mkdir(exist_ok=True) (#5258)
* all save_pretrained methods mkdir if not os.path.exists
2020-06-28 14:53:47 -04:00
12dfbd4f7a [examples] fix example links (#5344) 2020-06-28 12:54:54 -04:00
98109464c1 clean reformer reverse sort (#5343) 2020-06-28 14:32:25 +02:00
1af58c0706 New model sharing tutorial (#5323) 2020-06-27 11:10:02 -04:00
efae6645e2 Fix xxx_length behavior when using XLNet in pipeline (#5319) 2020-06-27 11:09:51 -04:00
393b8dc09a examples/seq2seq/run_eval.py fixes and docs (#5322) 2020-06-26 19:20:43 -04:00
5543b30aa6 [pl_examples] default warmup steps=0 (#5316) 2020-06-26 15:03:41 -04:00
bf0d12c220 CircleCI stores cleaner output at test_outputs.txt (#5291) 2020-06-26 13:59:31 -04:00
601d4d699c [tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308)
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
2020-06-26 19:48:14 +02:00
fd405e9a93 Add BART-base modeling and configuration (#5315) 2020-06-27 00:53:10 +08:00
798dbff6a7 [pipelines] Change summarization default to distilbart-cnn-12-6 (#5289) 2020-06-26 11:43:23 -04:00
834b6884c5 Add benchmark notebook (#5312)
* add notebook

* Créé avec Colaboratory

* move notebook to correct folder

* correct link

* correct filename

* correct filename

* better name
2020-06-26 17:38:13 +02:00
08c9607c3d [Generation] fix docs for decoder_input_ids (#5306)
* fix docs

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_tf_utils.py

* Update src/transformers/modeling_utils.py
2020-06-26 16:58:11 +02:00
79a82cc06a [Benchmarks] improve Example Plotter (#5245)
* improve plotting

* better labels

* fix time plot
2020-06-26 15:00:14 +02:00
88d7f96e33 Gpt2 model card (#5283)
* Bert base model card

* Add metadata

* Adapt examples

* GPT2 model card

* Remove the BERT model card

* Change language code
2020-06-26 08:08:31 -04:00
fc5bce9e60 Bert base model card (#5276)
* Bert base model card

* Add metadata

* Adapt examples

* Comment on text generation

* Update model_cards/bert-base-uncased-README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-26 08:01:19 -04:00
135791e8ef Add pad_to_multiple_of on tokenizers (reimport) (#5054)
* Add new parameter `pad_to_multiple_of` on tokenizers.

* unittest for pad_to_multiple_of

* Add .name when logging enum.

* Fix missing .items() on dict in tests.

* Add special check + warning if the tokenizer doesn't have proper pad_token.

* Use the correct logger format specifier.

* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.

* Skip test if tokenizer doesn't have pad_token

* Fix RobertaTokenizer on empty input

* Format.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* fix and updating to simpler API

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-26 11:55:57 +02:00
7cc15bdd96 Closes #5218 2020-06-25 18:19:21 -04:00
2ffef0d0c7 Training & fine-tuning quickstart (#5034)
* add initial fine-tuning guide

* split code blocks to smaller segments

* fix up trianer section of fine-tune doc

* a few last typos

* Update usage -> task summary link

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 15:11:11 -06:00
364a5ae1f0 Refactor Code samples; Test code samples (#5036)
* Refactor code samples

* Test docstrings

* Style

* Tokenization examples

* Run rust of tests

* First step to testing source docs

* Style and BART comment

* Test the remainder of the code samples

* Style

* let to const

* Formatting fixes

* Ready for merge

* Fix fixture + Style

* Fix last tests

* Update docs/source/quicktour.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Addressing @sgugger's comments + Fix MobileBERT in TF

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-06-25 16:46:00 -04:00
315f464b0a [tokenizers] Several small improvements and bug fixes (#5287)
* avoid recursion in id checks for fast tokenizers

* better typings and fix #5232

* align slow and fast tokenizers behaviors for Roberta and GPT2

* style and quality

* fix tests - improve typings
2020-06-25 22:17:14 +02:00
24f46ea3f3 Remove links for all docs (#5280) 2020-06-25 11:45:05 -04:00
27cf1d97f0 [Tokenization] Fix #5181 - make #5155 more explicit - move back the default logging level in tests to WARNING (#5252)
* fix-5181

Padding to max sequence length while truncation to another length was wrong on slow tokenizers

* clean up and fix #5155

* fix XLM test

* Fix tests for Transfo-XL

* logging only above WARNING in tests

* switch slow tokenizers tests in @slow

* fix Marian truncation tokenization test

* style and quality

* make the test a lot faster by limiting the sequence length used in tests
2020-06-25 17:24:28 +02:00
e008d520bb [examples/seq2seq] more README improvements (#5274) 2020-06-25 10:13:01 -04:00
6a495cae00 [model_cards] Example of how to specify inputs for the widget 2020-06-25 15:58:25 +02:00
0e1fce3c01 Fix convert_graph_to_onnx (#5230) 2020-06-25 08:17:02 +02:00
5543efd5cc Create README.md (#5259) 2020-06-25 01:56:07 -04:00
40457bcebb examples/seq2seq supports translation (#5202) 2020-06-24 23:58:11 -04:00
d12ceb48ba Tokenization tutorial (#5257)
* All done

* Link to the tutorial

* Typo fixes

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Add metnion of the return_xxx args

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-06-24 18:43:20 -04:00
7ac9110711 Add more tests on tokenizers serialization - fix bugs (#5056)
* update tests for fast tokenizers + fix small bug in saving/loading

* better tests on serialization

* fixing serialization

* comment cleanup
2020-06-24 21:53:08 +02:00
0148c262e7 Fix first test (#5255) 2020-06-24 15:16:04 -04:00
70c1e1d2d5 Use master _static (#5253)
* Use _static from master everywhere

* Copy to existing too
2020-06-24 15:06:14 -04:00
4965aee064 [HANS] Fix label_list for RoBERTa/BART (class flipping) (#5196)
* fix weirdness in roberta/bart for mnli trained checkpoints

* black compliance

* isort code check
2020-06-24 14:38:15 -04:00
fc24a93e64 [HfApi] Add support for pipeline_tag 2020-06-24 16:54:00 +00:00
0a3d0e02c5 Replace labels with -100 to skip loss calc (#4718) 2020-06-24 12:14:50 -04:00
6894b486d0 Fix version controller links (for realsies) (#5251) 2020-06-24 12:13:43 -04:00
1121ce9f98 Model cards for Hate-speech-CNERG models (#5236)
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card

* model cards for Hate-speech-CNERG models
2020-06-24 11:41:08 -04:00
cf10d4cfdd Cleaning TensorFlow models (#5229)
* Cleaning TensorFlow models

Update all classes


stylr

* Don't average loss
2020-06-24 11:37:20 -04:00
609e0c583f Fix links (#5248) 2020-06-24 11:35:55 -04:00
c9163a8d5a delay decay schedule until the end of warmup (#4940) 2020-06-24 11:18:29 -04:00
f216b60671 Fix deploy doc (#5246)
* Try with the same command

* Try like this
2020-06-24 10:59:06 -04:00
49f6e7a3c6 Add some prints to debug (#5244) 2020-06-24 10:37:01 -04:00
c2a26ec8a6 [Use cache] Align logic of use_cache with output_attentions and output_hidden_states (#5194)
* fix use cache

* add bart use cache

* fix bart

* finish bart
2020-06-24 16:09:17 +02:00
64c393ee74 Don't recreate old docs (#5243) 2020-06-24 09:59:07 -04:00
b29683736a fix print in benchmark (#5242) 2020-06-24 15:58:49 +02:00
9fe09cec76 [Benchmark] Extend Benchmark to all model type extensions (#5241)
* add benchmark for all kinds of models

* improved import

* delete bogus files

* make style
2020-06-24 15:11:42 +02:00
7c41057d50 Add hugs (#5225) 2020-06-24 07:56:14 -04:00
5e85b324ec Use the script in utils (#5224) 2020-06-24 07:55:58 -04:00
5e31a98ab7 Create README.md (#5108)
* Create README.md

* Update model_cards/a-ware/roberta-large-squad-classification/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-24 04:45:51 -04:00
033124e5f8 Update README.md (#5199)
Fix/add information in README.md
2020-06-24 04:42:46 -04:00
7ca6627ec3 Create README.md (#5217)
electra_large_discriminator_squad2_512 Question Answering LM
2020-06-24 04:40:50 -04:00
54e9ce785d Fix PABEE division by zero error (#5233)
* Fix PABEE division by zero error

* patience=0 by default
2020-06-24 16:10:36 +08:00
9022ef021a Only put tensors on a device (#5223)
* Only put tensors on a device

* Type hint and unpack list comprehension
2020-06-23 17:30:17 -04:00
173528e368 Add version control menu (#5222)
* Add version control menu

* Constify things

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-23 17:05:12 -04:00
76e5af4cfd [pl_examples] revert deletion of optimizer_step (#5227) 2020-06-23 16:40:45 -04:00
c01480bba3 [file_utils] Type user-agent 2020-06-23 18:31:13 +02:00
58918c76f4 [bart] add config.extra_pos_embeddings to facilitate reuse (#5190) 2020-06-23 11:35:42 -04:00
b28b537131 More clear error message in the use-case of #5169 (#5184) 2020-06-23 13:37:29 +02:00
11fdde0271 Tokenizers API developments (#5103)
* Add return lengths

* make pad a bit more flexible so it can be used as collate_fn

* check all kwargs sent to encoding method are known

* fixing kwargs in encodings

* New AddedToken class in python

This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens.

* style and quality

* switched to hugginface tokenizers library for AddedTokens

* up to tokenizer 0.8.0-rc3 - update API to use AddedToken state

* style and quality

* do not raise an error on additional or unused kwargs for tokenize() but only a warning

* transfo-xl pretrained model requires torch

* Update src/transformers/tokenization_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-23 13:36:57 +02:00
1ae132a07d [Reformer] Axial Pos Emb Improve mem usage reformer (#5209)
* improve mem handling

* improve mem for pos ax encodings
2020-06-23 10:49:18 +02:00
5144104070 [fix] remove unused import (#5206) 2020-06-22 23:39:04 -04:00
0d158e38c9 [fix] mobilebert had wrong path, causing slow test failure (#5205) 2020-06-22 23:31:36 -04:00
f5c2a122e3 Upgrade examples to pl=0.8.1(#5146) 2020-06-22 20:40:10 -04:00
06b60c8b05 [Modelcard] bart-squadv2 (#5011)
* [Modelcard] bart-squadv2

* Update README.md

* Update README.md
2020-06-22 18:40:19 -04:00
35e0687256 Create README.md (#5013) 2020-06-22 18:40:00 -04:00
22d2c8ea2f Create README.md for finetuned BERT model (#5009)
* Create README.md

* changes in model usage section

* minor changes in output visualization

* minor errata in readme
2020-06-22 18:39:29 -04:00
2589505693 Add model card for StackOBERTflow-comments-small (#5008)
* Create README.md

* Update README.md
2020-06-22 18:39:22 -04:00
d8c26ed139 Specify dataset used for crossvalidation (#5175) 2020-06-22 18:26:12 -04:00
a34fb91d54 Create README.md (#5149) 2020-06-22 18:00:53 -04:00
ffabcf5249 Create README.md (#5160) 2020-06-22 17:59:54 -04:00
3363a19b12 Create README.md (#5152)
* Create README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 17:59:33 -04:00
0cca61925c Add link to new comunity notebook (optimization) (#5195)
* Add link to new comunity notebook (optimization)

related to https://github.com/huggingface/transformers/issues/4842#event-3469184635

This notebook is about benchmarking model training with/without dynamic padding optimization. 
https://github.com/ELS-RD/transformers-notebook 

Using dynamic padding on MNLI provides a **4.7 times training time reduction**, with max pad length set to 512. The effect is strong because few examples are >> 400 tokens in this dataset. IRL, it will depend of the dataset, but it always bring improvement and, after more than 20 experiments listed in this [article](https://towardsdatascience.com/divide-hugging-face-transformers-training-time-by-2-or-more-21bf7129db9q-21bf7129db9e?source=friends_link&sk=10a45a0ace94b3255643d81b6475f409), it seems to not hurt performance.

Following advice from @patrickvonplaten I do the PR myself :-)

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 23:47:33 +02:00
1c5cd8e5f5 Add README.md (nyu-mll) (#5174)
* nyu-mll: roberta on smaller datasets

* Update README.md

* Update README.md

Co-authored-by: Alex Warstadt <alexwarstadt@gmail.com>
2020-06-22 17:24:27 -04:00
c439752482 Switch master/stable doc and add older releases (#5193) 2020-06-22 16:38:53 -04:00
417e492f1e Quick tour (#5145)
* Quicktour part 1

* Update

* All done

* Typos

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address comments in quick tour

* Update docs/source/quicktour.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update from feedback

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 16:08:09 -04:00
75e1eed8d1 Cleaner warning when loading pretrained models (#4557)
* Cleaner warning when loading pretrained models

This make more explicit logging messages when using the various `from_pretrained` methods. It also make these messages as `logging.warning` because it's a common source of silent mistakes.

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* style and quality

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 21:58:47 +02:00
4e741efa92 Have documentation fail on warning (#5189)
* Have documentation fail on warning

* Force ci failure

* Revert "Force ci failure"

This reverts commit f0a4666ec2eb4cd00a4da48af3357defc63324a0.
2020-06-22 15:49:50 -04:00
1262495a91 Add TF auto model to the docs + fix sphinx warnings (#5187) 2020-06-22 14:43:52 -04:00
88429c57bc Create README.md (#5165) 2020-06-22 13:49:14 -04:00
76ee9c8bc9 Create README.md (#5107)
* Create README.md

@julien-c check out that dataset meta tag is right

* Fix typo

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-22 13:47:30 -04:00
bf493d5569 Model card for t5-base-finetuned-emotion (recognition) (#5179) 2020-06-22 13:45:45 -04:00
e9ef21175e improve doc (#5185) 2020-06-22 19:00:11 +02:00
ebc36108dc [tokenizers] Fix #5081 and improve backward compatibility (#5125)
* fix #5081 and improve backward compatibility (slightly)

* add nlp to setup.cfg - style and quality

* align default to previous default

* remove test that doesn't generalize
2020-06-22 17:25:43 +02:00
d2a7c86dc3 Check if text is set to avoid IndexError (#4209)
Fix for https://github.com/huggingface/transformers/issues/3809
2020-06-22 11:09:05 -04:00
90f4b24520 Add support for gradient checkpointing in BERT (#4659)
* add support for gradient checkpointing in BERT

* fix unit tests

* isort

* black

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

* Revert "workaround for `torch.utils.checkpoint.checkpoint` not accepting bool"

This reverts commit 5eb68bb804f5ffbfc7ba13c45a47717f72d04574.

* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 10:47:14 -04:00
f4e1f02210 Output hidden states (#4978)
* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Configure all models to use output_hidden_states as argument passed to foward()

* Pass all tests

* Remove cast_bool_to_primitive in TF Flaubert model

* correct tf xlnet

* add pytorch test

* add tf test

* Fix broken tests

* Refactor output_hidden_states for mobilebert

* Reset and remerge to master

Co-authored-by: Joseph Liu <joseph.liu@coinflex.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-06-22 10:10:45 -04:00
866a8ccabb Add model cards for Microsoft's MiniLM (#5178)
* Add model cards for Microsoft's MiniLM

* XLMRobertaTokenizer

* format

* Add thumbnail

* finishing up
2020-06-22 21:48:14 +08:00
b99ad457f4 Added feature to move added tokens in vocabulary for Transformer-XL (#4953)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Added feature to move added tokens in tokenizer.

* Fixed code quality

* Fixed docstring, renamed sym to 	oken.

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-22 15:40:52 +02:00
eb0ca71ef6 Update glossary (#5148)
* Update glossary

* Update docs/source/glossary.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-22 08:30:49 -04:00
fa0be6d761 Benchmarks (#4912)
* finish benchmark

* fix isort

* fix setup cfg

* retab

* fix time measuring of tf graph mode

* fix tf cuda

* clean code

* better error message
2020-06-22 12:06:56 +02:00
18a0150bfa fix bart doc (#5132)
fix bart doc
2020-06-22 10:58:28 +02:00
3fe75c7f70 Fixing docs for Encoder Decoder Config (#5171) 2020-06-22 10:51:17 +02:00
59345cc87f Typo (#5147) 2020-06-22 10:49:23 +02:00
bc3a0c0607 [examples] fixes arguments for summarization finetune scripts (#5157)
Authored-by: i.boytsov <i.boytsov@MAC867.local>
2020-06-21 11:51:21 -04:00
68e19f1c22 Fix typo in root README (#5073) 2020-06-20 23:00:04 +08:00
c0c577cf8f Fix PABEE's result table (#5158) 2020-06-20 22:56:39 +08:00
aa6a29bc25 SummarizationPipeline: init required task name (#5086)
* SummarizationPipeline: init required task name

* Update src/transformers/pipelines.py

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>

* Apply suggestions from code review

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-20 03:16:30 -04:00
2fd28d4363 Add BERT Loses Patience (Patience-based Early Exit) (#5078)
* Add BERT Loses Patience (Patience-based Early Exit)

* update model archive

* update format

* sort import

* flake8

* Add results

* full results

* align the table

* refactor to inherit

* default per gpu eval = 1

* Formatting

* Formatting

* isort

* modify readme

* Add check

* Fix format

* Fix format

* Doc strings

* ALBERT & BERT for sequence classification don't inherit from the original anymore

* Remove incorrect comments

* Remove incorrect comments

* Remove incorrect comments

* Sync up with new code

* Sync up with new code

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Add a test

* Finishing up!
2020-06-20 13:41:46 +08:00
f1679d7c48 Fix dropout in TFMobileBert (#5150) 2020-06-20 13:21:19 +08:00
5ed94b2312 Update note to avoid confusion (#5131) 2020-06-20 10:13:34 +08:00
d97b4176e5 Correct device assignment 2020-06-19 21:58:28 -04:00
9a3f91088c Add MobileBert (#4901)
* Add MobileBert

* Quality + Conversion script

* style

* Update src/transformers/modeling_mobilebert.py

* Links to S3

* Style

* TFMobileBert

Slight fixes to the pytorch MobileBert
Style

* MobileBertForMaskedLM (PT + TF)

* MobileBertForNextSentencePrediction (PT + TF)

* MobileFor{MultipleChoice, TokenClassification} (PT + TF)


ss

* Tests + Auto

* Doc

* Tests

* Addressing @sgugger's comments

* Adressing @patrickvonplaten's comments

* Style

* Style

* Integration test

* style

* Model card

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-19 16:38:36 -04:00
f45e873910 [bart-mnli] Fix class flipping bug (#5141) 2020-06-19 13:33:24 -04:00
e33929ef1e Fix in Reformer Config documentation (#5138) 2020-06-19 15:41:31 +02:00
84be482f66 AutoTokenizer supports mbart-large-en-ro (#5121) 2020-06-18 20:47:37 -04:00
2db1e2f415 [cleanup] remove redundant code in SummarizationDataset (#5119) 2020-06-18 20:34:48 -04:00
5f721ad6e4 Fix #5114 (#5122) 2020-06-18 19:20:04 -04:00
a258982af3 Add missing arg in 02-transformers notebook (#5085)
* Add missing arg when creating model

* Fix typos

* Remove from_tf flag when creating model
2020-06-18 19:04:04 -04:00
32e94cff64 tf add resize_token_embeddings method (#4351)
* resize token embeddings

* add tokens

* add tokens

* add tokens

* add t5 token method

* add t5 token method

* add t5 token method

* typo

* debugging input

* debugging input

* debug

* debug

* debug

* trying to set embedding tokens properly

* set embeddings for generation head too

* set embeddings for generation head too

* debugging

* debugging

* enable generation

* add base method

* add base method

* add base method

* return logits in the main call

* reverting to generation

* revert back

* set embeddings for the bert main layer

* description

* fix conflicts

* logging

* set base model as self

* refactor

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* tf_bert add method

* v0

* v0

* finalize

* final

* black

* add tests

* revert back the emb call

* comments

* comments

* add the second test

* add vocab size condig

* add tf models

* add tf models. add common tests

* remove model specific embedding tests

* stylish

* remove files

* stylez

* Update src/transformers/modeling_tf_transfo_xl.py

change the error.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* adding unchanged weight test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 18:41:26 -04:00
973433260e Pin sphinx-rtd-theme (#5128) 2020-06-18 18:07:59 -04:00
8a377c3d6e [fix] Move _adjust_logits above postprocess to fix Marian.generate (#5126) 2020-06-18 18:06:27 -04:00
3d3e605aff [cleanup] generate_beam_search comments (#5115) 2020-06-18 16:30:24 -04:00
ca2d0f98c4 ElectraForMultipleChoice (#4954)
* add ElectraForMultipleChoice

* add  test_for_multiple_choice

* add ElectraForMultipleChoice in auto model

* add ElectraForMultipleChoice in all_model_classes

* add SequenceSummary related parameters

* get rid pooler, use SequenceSummary instead

* add electra multiple choice test

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-18 14:59:35 -04:00
279d8e24f7 support local_files_only option for tf models (#5116) 2020-06-18 13:47:05 -04:00
355954ffca Create distilbert-base-uncased-distilled-squad-README.md 2020-06-18 05:17:45 -04:00
18177a1a60 lm_labels => labels (#5080) 2020-06-18 09:16:29 +02:00
efeb75b805 Remove misleading comment
closes #4958
2020-06-17 18:24:35 -04:00
bb154ac50c Fixing TPU training by disabling wandb.watch gradients logging for TPU (#4926) 2020-06-17 18:04:11 -04:00
fb6cccb863 fix qa example (#4929) 2020-06-17 17:54:16 -04:00
38bba9cdd5 Fix deprecation warnings due to invalid escape sequences. (#4924) 2020-06-17 17:46:58 -04:00
f1a3d03741 add pandas to setup.cfg (#5093) 2020-06-17 16:39:17 -04:00
90c833870c [MarianTokenizer] Switch to sacremoses for punc normalization (#5092) 2020-06-17 16:31:05 -04:00
049e14f0e3 very minor spelling correction in script command (#5090)
actual script name - counts_parameters.py
2020-06-17 16:08:43 -04:00
20fa828984 Make default_data_collator more flexible and deprecate old behavior (#5060)
* Make default_data_collator more flexible

* Accept tensors for all features

* Document code

* Refactor

* Formatting
2020-06-17 15:24:51 -04:00
5e06963394 Some changes to simplify the generation function (#5031)
* moving logits post-processing out of beam search

* moving logits post-processing out of beam search

* first step cache

* fix_Encoder_Decoder

* patrick_version_postprocess

* add_keyword_arg
2020-06-17 14:48:06 -04:00
204ebc25e6 Update installation page and add contributing to the doc (#5084)
* Update installation page and add contributing to the doc

* Remove mention of symlinks
2020-06-17 14:01:10 -04:00
043f9f51f9 [examples] SummarizationModule improvements (#4951) 2020-06-17 13:51:34 -04:00
cd40f6564e Add header and fix command (#5082) 2020-06-17 11:45:05 -04:00
70bc3ead4f [TextClassificationPipeline] Hotfix: make json serializable 2020-06-17 15:09:27 +00:00
7291ea0bff Reorganize documentation (#5064)
* Reorganize topics and add all models
2020-06-17 07:55:20 -04:00
e4aaa45805 Update pipeline examples to doctest syntax (#5030) 2020-06-16 18:14:58 -04:00
011cc0be51 Fix all sphynx warnings (#5068) 2020-06-16 16:50:02 -04:00
af497b5672 Typo (#5069) 2020-06-16 16:46:20 -04:00
49c5202522 Eli5 examples (#4968)
* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-16 16:36:58 -04:00
c3e607496c [cleanup] examples test_run_squad uses tiny model (#5059) 2020-06-16 14:06:45 -04:00
439aa1d6e9 Remove old section + caching in install (#5027) 2020-06-16 13:03:41 -04:00
3d495c61ef Fix marian tokenizer save pretrained (#5043) 2020-06-16 09:48:19 -04:00
d5477baf7d Convert hans to Trainer (#5025)
* Convert hans to Trainer

* Tick box
2020-06-16 08:06:31 -04:00
c852036b4a [cleanup] Hoist ModelTester objects to top level (#4939)
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-06-16 08:03:43 -04:00
0c55a384f8 Add reference to NLP dataset (#5028)
* Add reference to NLP dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:19:09 -04:00
0946d1209d Add reference to NLP (package) dataset (#5029)
* Add reference to NLP (package) dataset

* Update README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-16 04:17:46 -04:00
edcb3ac59a refactor(wandb): consolidate import (#5044) 2020-06-16 03:40:43 -04:00
9e03364999 Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039)
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.

* Added __get_state__() & __set_state__() to be pickable.

* Correct tokens() return type from List[int] to List[str]

* Added unittest for BatchEncoding pickle/unpickle

* Added unittest for BatchEncoding is_fast

* More careful checking on BatchEncoding unpickle tests.

* Formatting.

* is_fast should assertTrue on Rust tokenizers.

* Ensure tensorflow has correct way of checking array_equal

* More formatting.
2020-06-16 09:25:25 +02:00
f9f8a5312e Add DistilBertForMultipleChoice (#5032)
* Add `DistilBertForMultipleChoice`
2020-06-15 18:31:41 -04:00
36434220fc [HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
ebba39e4e1 [Bart] Question Answering Model is added to tests (#5024)
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
2020-06-15 22:50:09 +02:00
bbad4c6989 Add position_ids (#5021) 2020-06-15 15:50:17 -04:00
1bf4098e03 feat(TFTrainer): improve logging (#4946)
* feat(tftrainer): improve logging

* fix(trainer): consider case with evaluation only

* refactor(tftrainer): address comments

* refactor(tftrainer): move self.epoch_logging to __init__
2020-06-15 14:06:17 -04:00
7b5a1e7d51 Fix importing transformers on Windows (#4997) 2020-06-15 19:36:57 +02:00
a9f1fc6c94 Add bart-base (#5014) 2020-06-15 13:29:26 -04:00
7b685f5229 Increase pipeline support for ONNX export. (#5005)
* Increase pipeline support for ONNX export.

* Style.
2020-06-15 19:13:58 +02:00
1affde2f10 Make DataCollator a callable (#5015)
* Make DataCollator a callable

* Update src/transformers/data/data_collator.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 11:58:33 -04:00
f7c93b3cee Possible fix to make AMP work with DDP in the trainer (#4728)
* manually set device in trainer args

* check if current device is cuda before set_device

* Explicitly set GPU ID when using single GPU

This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099
2020-06-15 10:10:26 -04:00
66bcfbb130 Create README.md (#4975)
* Create README.md

* Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-15 08:43:50 -04:00
d812e6d76e NER: fix construction of input examples for RoBERTa (#4943)
* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model
2020-06-15 08:30:40 -04:00
ebab096e86 [model card] model card for bart-large-finetuned-squadv1 (#4977)
* [model card] model card for bart-large-finetuned-squadv1

* add metadata link to the dataset
2020-06-15 05:39:41 -04:00
9ad36ad57f Improve ONNX logging (#4999)
* Improve ONNX export logging to give more information about the generated graph.

* Correctly handle input and output in the logging.
2020-06-15 11:04:51 +02:00
9931f817b7 fix (#4976) 2020-06-14 21:36:14 +02:00
9208f57b16 BartTokenizerFast (#4878) 2020-06-14 13:04:49 -04:00
403d309857 Hans data (#4854)
* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README
2020-06-13 09:35:13 -04:00
ca5e1cdf8e model_cards: we can now tag datasets
see corresponding model pages to see how it's rendered
2020-06-12 23:19:07 +02:00
e93ccb3290 BartForQuestionAnswering (#4908) 2020-06-12 15:47:57 -04:00
538531cde5 Add AlbertForMultipleChoice (#4959)
* Add AlbertForMultipleChoice

* Make up to date and add all models to common tests
2020-06-12 14:20:19 -04:00
fe24139702 Create README.md (#4865) 2020-06-12 09:03:43 -04:00
9aa219a1fe Create README.md (#4872) 2020-06-12 09:03:13 -04:00
86578bb04c [AutoModel] Split AutoModelWithLMHead into clm, mlm, encoder-decoder (#4933)
* first commit

* add new auto models

* better naming

* fix bert automodel

* fix automodel for pretraining

* add models to init

* fix name typo

* fix typo

* better naming

* future warning instead of depreciation warning
2020-06-12 10:01:49 +02:00
5620033115 [mbart] Fix fp16 testing logic (#4949) 2020-06-11 22:11:34 -04:00
473808da0d update mvmt-pruning/saving_prunebert (updating torch to 1.5) 2020-06-11 19:42:45 +00:00
caf3746678 fix indentation issue (#4941) 2020-06-11 21:28:01 +02:00
6293eb04df [Model card] model card for electra-base QA model (#4936) 2020-06-11 13:16:34 -04:00
08b59d10e5 MBartTokenizer:add language codes (#3776) 2020-06-11 13:02:33 -04:00
20451195f0 Support multiple choice in tf common model tests (#4920)
* Support multiple choice in tf common model tests

* Add the input_embeds test
2020-06-11 10:31:26 -04:00
699541c4b3 TFTrainer: Add dataloader_drop_last (#4925) 2020-06-11 02:11:22 -04:00
e80d6c689b Fix resize_token_embeddings for Transformer-XL (#4759)
* Fixed resize_token_embeddings for transfo_xl model

* Fixed resize_token_embeddings for transfo_xl.

Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.

* Updated docstring

* Fixed resizinhg cutoffs; added check for new size of embedding layer.

* Added test for resize_token_embeddings

* Fixed code quality

* Fixed unchanged cutoffs in model.config

Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
2020-06-10 19:03:06 -04:00
d541938c48 Make multiple choice models work with input_embeds (#4921) 2020-06-10 18:38:34 -04:00
1e2631d6f8 Split LMBert model in two (#4874)
* Split LMBert model in two

* Fix example

* Remove lm_labels

* Adapt tests, refactor prepare_for_generation

* Fix merge

* Hide BeartLMHeadModel
2020-06-10 18:26:42 -04:00
f6da8b2200 check type before logging in trainer to ensure values are scalars (#4883)
* check type before logging to ensure it's a scalar

* log when Trainer attempts to add a non-scalar value using TensorboardX's writer.add_scalar so we know what kinds of fixes are appropriate

* black it

* rephrase log message to clarify attribute was dropped

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 18:25:55 -04:00
1c986f42ff Create README.md (#4871) 2020-06-10 17:29:41 -04:00
3ae2e86baf Run a single wandb instance per TPU run (#4851)
* Run a single wandb instance per TPU run

* wandb: self.is_world_master

* make style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-10 16:28:18 -04:00
466aa57a45 Don't init TPU device twice (#4916) 2020-06-10 15:53:15 -04:00
ef2dcdccaa ElectraForQuestionAnswering (#4913)
* ElectraForQuestionAnswering

* udate __init__

* add test for electra qa model

* add ElectraForQuestionAnswering in auto models

* add ElectraForQuestionAnswering in all_model_classes

* fix outputs, input_ids defaults to None

* add ElectraForQuestionAnswering in docs

* remove commented line
2020-06-10 15:17:52 -04:00
5d63ca6c38 [ctrl] fix pruning of MultiHeadAttention (#4904) 2020-06-10 14:06:55 -04:00
4e10acb3e5 Add more models to common tests (#4910) 2020-06-10 13:19:53 -04:00
3b3619a327 [All models] fix docs after adding output attentions to all forward functions (#4909)
* fix doc

* add format file

* add output attentions to all docs

* add also for bart

* fix naming

* re-add doc to config
2020-06-10 18:10:59 +02:00
ac99217e92 Fix the CI (#4903)
* Fix CI
2020-06-10 09:26:06 -04:00
0a375f5abd Deal with multiple choice in common tests (#4886)
* Deal with multiple choice in common tests
2020-06-10 08:10:20 -04:00
e8db8b845a Remove unused arguments in Multiple Choice example (#4853)
* Remove unused arguments

* Formatting

* Remove second todo comment
2020-06-09 20:05:09 -04:00
29c36e9f36 run_pplm.py bug fix (#4867)
`is_leaf` may become `False` after `.to(device=device)` function call.
2020-06-09 19:14:27 -04:00
13aa174112 uninstalled wandb raises AttributeError 2020-06-09 18:50:56 -04:00
6e603cb789 [All models] Extend config.output_attentions with output_attentions function arguments (#4538)
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-09 23:39:06 +02:00
f90bc44d9a [examples] Cleanup summarization docs (#4876) 2020-06-09 17:38:28 -04:00
2cfb947f59 [Benchmark] add tpu and torchscipt for benchmark (#4850)
* add tpu and torchscipt for benchmark

* fix name in tests

* "fix email"

* make style

* better log message for tpu

* add more print and info for tpu

* allow possibility to print tpu metrics

* correct cpu usage

* fix test for non-install

* remove bugus file

* include psutil in testing

* run a couple of times before tracing in torchscript

* do not allow tpu memory tracing for now

* make style

* add torchscript to env

* better name for torch tpu

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2020-06-09 23:12:43 +02:00
f0340b3031 Removes from the of the parent of TFRobertaClassificationHead (#4884)
Co-authored-by: Hamza Harkous <harkous@google.com>
2020-06-09 16:14:01 -04:00
02e5f79662 [examples] consolidate summarization examples (#4837) 2020-06-09 11:14:12 -04:00
9f5d5a531d Fix the __getattr__ method in BatchEncoding (#4772) 2020-06-09 09:44:00 +02:00
41a1d27cde Add XLMRobertaForQuestionAnswering (#4855)
* Add XLMRobertaForQuestionAnswering

* Formatting

* Make test happy
2020-06-08 21:22:37 -04:00
a139d1a160 [cleanup] consolidate some prune_heads logic (#4799) 2020-06-08 17:08:04 -04:00
4c7f564f9a fix (#4839) 2020-06-08 18:28:50 +02:00
37be3786cf Clean documentation (#4849)
* Clean documentation
2020-06-08 11:28:19 -04:00
42860e92a4 Turn off codecov patch for now 2020-06-08 09:47:13 -04:00
36dfc317b3 TF Checkpoints (#4831)
* Align checkpoint dir with the PT trainer

* Use args for max to keep checkpoints
2020-06-08 09:45:23 -04:00
439f1cab20 [Generate] beam search should generate without replacement (#4845)
* fix flaky beam search

* fix typo
2020-06-08 15:31:32 +02:00
c0554776de fix PR (#4810) 2020-06-08 15:31:12 +02:00
e817747941 Expose classes used in documentation (#4808)
* Expose classes used in documentation

* Format code
2020-06-08 08:14:32 -04:00
b6f365a8ed Updates args in tf squad example. (#4820)
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
2020-06-08 05:36:09 -04:00
e33fdc93b4 Export PretrainedBartModel from __init__ (#4819) 2020-06-07 11:55:10 -04:00
c58e6c129a [marian tests ] pass device to pipeline (#4815) 2020-06-06 00:52:17 -04:00
ddf9a3dfc7 Updated path "cd examples/text-generation/pplm" (#4778)
https://github.com/huggingface/transformers/issues/4776
2020-06-05 21:16:48 -04:00
2d372a990b Explain how to preview the docs in a PR (#4795) 2020-06-05 20:47:02 -04:00
56d5d160cd Add model and doc badges (#4811)
* Add badges for models and docs
2020-06-05 18:45:42 -04:00
4ab7424597 [cleanup/marian] pipelines test and new kwarg (#4812) 2020-06-05 18:45:19 -04:00
875288b344 [isort] add matplotlib to known 3rd party dependencies (#4800) 2020-06-05 17:27:31 -04:00
8cca875569 [EncoderDecoderConfig] automatically set decoder config to decoder (#4809)
* automatically set decoder config to decoder

* add more tests
2020-06-05 23:16:37 +02:00
f1fe18465d Use labels to remove deprecation warnings (#4807) 2020-06-05 16:41:46 -04:00
5c0cfc2cf0 Add link to community models (#4804) 2020-06-05 15:29:20 -04:00
4dd5cf2207 Fix argument label (#4792)
* Fix argument label

* Fix test
2020-06-05 15:20:29 -04:00
3723f30a18 [cleanup] MarianTokenizer: delete unused constants (#4802) 2020-06-05 14:57:24 -04:00
acaa2e6267 Clean-up code (#4790) 2020-06-05 12:36:22 -04:00
fa661ce749 Add model summary (#4789)
* Add model summary

* Add link to pretrained models
2020-06-05 12:22:50 -04:00
79ab881eb1 No silent error when d_head already in the configuration (#4747)
* No silent error when d_head already in the configuration

* Update src/transformers/configuration_xlnet.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-05 12:01:43 -04:00
b9109f2de1 [doc] Make it clearer that text-generation does not involve training 2020-06-05 14:59:22 +02:00
ceaab8dd22 Add .vs to gitignore (#4774) 2020-06-05 07:56:11 -04:00
f9414f7553 Tensorflow improvements (#4530)
* Better None gradients handling

* Apply Style

* Apply Style

* Create a loss class per task to compute its respective loss

* Add loss classes to the ALBERT TF models

* Add loss classes to the BERT TF models

* Add question answering and multiple choice to TF Camembert

* Remove prints

* Add multiple choice model to TF DistilBERT + loss computation

* Add question answering model to TF Electra + loss computation

* Add token classification, question answering and multiple choice models to TF Flaubert

* Add multiple choice model to TF Roberta + loss computation

* Add multiple choice model to TF XLM + loss computation

* Add multiple choice and question answering models to TF XLM-Roberta

* Add multiple choice model to TF XLNet + loss computation

* Remove unused parameters

* Add task loss classes

* Reorder TF imports + add new model classes

* Add new model classes

* Bugfix in TF T5 model

* Bugfix for TF T5 tests

* Bugfix in TF T5 model

* Fix TF T5 model tests

* Fix T5 tests + some renaming

* Fix inheritance issue in the AutoX tests

* Add tests for TF Flaubert and TF XLM Roberta

* Add tests for TF Flaubert and TF XLM Roberta

* Remove unused piece of code in the TF trainer

* bugfix and remove unused code

* Bugfix for TF 2.2

* Apply Style

* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name

* Apply style

* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling

* Fix TF optimizations tests and apply style

* Remove useless parameter

* Bugfix and apply style

* Fix TF Trainer prediction

* Now the TF models return the loss such as their PyTorch couterparts

* Apply Style

* Ignore some tests output

* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.

* Fix names for SQuAD data

* Apply Style

* Fix conflicts with 2.11 release

* Fix conflicts with 2.11

* Fix wrongname

* Add better documentation on the new create_optimizer function

* Fix isort

* logging_dir: use same default as PyTorch

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-04 19:45:53 -04:00
ccd26c2862 Create model card for tblard/allocine (#4775)
https://huggingface.co/tblard/tf-allocine
2020-06-04 19:15:07 -04:00
2a4b9e09c0 NER: Add new WNUT’17 example (#4681)
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
2020-06-04 19:13:17 -04:00
0e1869cc28 Add drop_last arg for data loader 2020-06-04 18:30:31 -04:00
48a05026de removed deprecared use of Variable api from pplm example 2020-06-04 18:07:49 -04:00
12d0eb5f3e Don't access pad_token_id if there is no pad_token (#4773) 2020-06-04 17:57:04 -04:00
17a88d3192 Create model card for T5-base fine-tuned for Sentiment Span Extraction (#4737) 2020-06-04 16:59:56 -04:00
fb52143cf6 Create README.md (#4743) 2020-06-04 16:59:37 -04:00
5f077a3445 Model Card for RoBERTa trained on Sanskrit (#4763)
* Model cad for SanBERTa

Model Card for RoBERTa trained on Sanskrit

* Model card for SanBERTa

model card for RoBERTa trained on Sanskrit
2020-06-04 16:58:40 -04:00
cd4e07a85e Add note about doc generation (#4770) 2020-06-04 13:43:14 -04:00
492b352ab6 Remove unnecessary model_type arg in example (#4771) 2020-06-04 13:41:24 -04:00
e645b9ab94 Codecov setup (#4768)
* Codecov setup

* Understanding codecov
2020-06-04 11:44:38 -04:00
2b8b6c929e [cleanup] PretrainedModel.generate: remove unused kwargs (#4761) 2020-06-04 08:13:52 -04:00
5bf9afbf35 Introduce a new tensor type for return_tensors on tokenizer for NumPy (#4585)
* Refactor tensor creation in tokenizers.

* Make sure to convert string to TensorType

* Refactor convert_to_tensors_

* Introduce numpy tensor creation

* Format

* Add unittest for TensorType creation from str

* sorting imports

* Added unittests for numpy tensor conversion.

* Do not use in-place version for squeeze as numpy doesn't provide such feature.

* Added extra parameter prepend_batch_axis: bool on prepare_for_model.

* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.

* style.

* numpy tests require_torch for now while flax not merged.

* Hopefully will make flake8 happy.

* One more time 🎶
2020-06-04 06:57:01 +02:00
efae154929 never_split on slow tokenizers should not split (#4723)
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.

* never_split only use membership attempt to use a set() which is 10x faster for this operation.

* Use union to concatenate two sets.

* Updated docstring for never_split parameter.

* Avoid set.union() if never_split is None

* Added comments.

* Correct docstring format.
2020-06-03 16:48:28 -04:00
2e4de76231 Update encode documentation (#4751) 2020-06-03 16:30:59 -04:00
ed4df85572 fix beam search bug in tf as well (#4745) 2020-06-03 12:53:23 -04:00
1b5820a565 Unify label args (#4722)
* Deprecate masked_lm_labels argument

* Apply to all models

* Better error message
2020-06-03 09:36:26 -04:00
3e5928c57d Adding notebooks for Fine Tuning [Community Notebook] (#4732)
* Added links to more community notebooks

Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials
Different Transformers models are fine tuned on Dataset using PyTorch

* Update README.md

* Update README.md

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-06-03 11:07:26 +02:00
99207bd112 Pipelines: miscellanea of QoL improvements and small features... (#4632)
* [hf_api] Attach all unknown attributes for future-proof compatibility

* [Pipeline] NerPipeline is really a TokenClassificationPipeline

* modelcard.py: I don't think we need to force the download

* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer

* FillMaskPipeline: also output token in string form

* TextClassificationPipeline: option to return all scores, not just the argmax

* Update docs/source/main_classes/pipelines.rst
2020-06-03 03:51:31 -04:00
8ed47aa10b bert-small-cord19 model cards (#4730)
* Create README.md

* Create README.md

* Create README.md
2020-06-03 03:40:14 -04:00
9ca485734a [Reformer] Improved memory if input is shorter than chunk length (#4720)
* improve handling of short inputs for reformer

* correct typo in assert statement

* fix other tests
2020-06-02 23:08:39 +02:00
b231a413f5 Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621)
* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-02 13:40:14 -04:00
70f7423436 TFRobertaModelIntegrationTest requires tf (#4726) 2020-06-02 12:59:00 -04:00
d976ef262e Repin versions 2020-06-02 10:27:15 -04:00
b42586ea56 Fix CI after killing archive maps (#4724)
* 🐛 Fix model ids for BART and Flaubert
2020-06-02 10:21:09 -04:00
b43c78e5d3 Release: v2.11.0 2020-06-02 09:49:09 -04:00
d4c2cb402d Kill model archive maps (#4636)
* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI
2020-06-02 09:39:33 -04:00
47a551d17b [pipeline] Tokenizer should not add special tokens for text generation (#4686)
* allow to not add special tokens

* remove print
2020-06-02 11:03:46 +02:00
f6d5046af1 Override get_vocab for fast tokenizer. (#4717) 2020-06-02 11:02:27 +02:00
88762a2f8c Specify PyTorch versions for examples (#4710) 2020-06-02 04:29:28 -04:00
d3ef14f931 Add community notebook for sentiment span extraction (#4700) 2020-06-02 09:59:53 +02:00
7677936316 Make docstring match args (#4711) 2020-06-01 15:22:51 -04:00
6449c494d0 close #4685 2020-06-01 12:57:52 -04:00
ec8717d5d8 [config] Ensure that id2label always takes precedence over num_labels 2020-06-01 16:54:55 +02:00
751a1e0890 [config] Ensure that id2label always takes precedence over num_labels
Fixes bug reported in https://github.com/huggingface/transformers/issues/4669

See #3967 for context
2020-06-01 16:25:56 +02:00
ec62b7d953 Fix onnx export input names order (#4641)
* pass on tokenizer to pipeline

* order input names when convert to onnx

* update style

* remove unused imports

* make ordered inputs list needs to be mutable

* add test custom bert model

* remove unused imports
2020-06-01 16:12:48 +02:00
bf760c80b5 finish README 2020-06-01 09:23:31 -04:00
9d7d9b3ae0 weird import 2020-06-01 09:23:31 -04:00
2a3c88a659 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
4ac462bfb8 Update examples/movement-pruning/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-06-01 09:23:31 -04:00
35fa0bbca0 clarify README 2020-06-01 09:23:31 -04:00
cc746a5020 flake8 compliance 2020-06-01 09:23:31 -04:00
b11386e158 less prints in saving prunebert 2020-06-01 09:23:31 -04:00
8b5d4003ab complete README 2020-06-01 09:23:31 -04:00
5c8e5b3709 commplying with isort 2020-06-01 09:23:31 -04:00
db2a3b2e01 space 2020-06-01 09:23:31 -04:00
5f8f2d849a add floppy bert model notebok 2020-06-01 09:23:31 -04:00
b41948f5cd add requirements 2020-06-01 09:23:31 -04:00
fb8f4277b2 add scripts 2020-06-01 09:23:31 -04:00
d489a6d3d5 add masked_run_* 2020-06-01 09:23:31 -04:00
e4c07faf0a add sparsity modules 2020-06-01 09:23:31 -04:00
667003e447 Create README.md (#4665) 2020-06-01 08:29:09 -04:00
ed23f5909e HooshvareLab readme parsbert-armananer (#4666)
Readme for HooshvareLab/bert-base-parsbert-armananer-uncased
2020-06-01 08:28:43 -04:00
3750b9b0b0 HooshvareLab readme parsbert-peymaner (#4667)
Readme for HooshvareLab/bert-base-parsbert-peymaner-uncased
2020-06-01 08:28:25 -04:00
036c2c6b02 Update HooshvareLab/bert-base-parsbert-uncased (#4687)
mBERT results added regarding NER datasets!
2020-06-01 08:27:00 -04:00
74872c19d3 Create README.md (#4684) 2020-06-01 05:45:54 -04:00
0866669e75 [EncoderDecoder] Fix initialization and save/load bug (#4680)
* fix bug

* add more tests
2020-05-30 01:25:19 +02:00
6f82aea66b Include nlp notebook for model evaluation (#4676) 2020-05-29 19:38:56 +02:00
33b7532e69 Fix longformer attention mask type casting when using apex (#4574)
* Fix longformer attention mask casting when using apex

* remove extra type casting
2020-05-29 18:13:30 +02:00
56ee2560be [Longformer] Better handling of global attention mask vs local attention mask (#4672)
* better api

* improve automatic setting of global attention mask

* fix longformer bug

* fix global attention mask in test

* fix global attn mask flatten

* fix slow tests

* update docstring

* update docs and make more robust

* improve attention mask
2020-05-29 17:58:42 +02:00
e2230ba77b Fix BERT example code for NSP and Multiple Choice (#3953)
Change the example code to use encode_plus since the token_type_id
wasn't being correctly set.
2020-05-29 11:55:55 -04:00
3a5d1ea2a5 Fix two bugs: 1. Index of test data of SST-2. 2. Label index of MNLI data. (#4546) 2020-05-29 11:12:24 -04:00
9c17256447 [Longformer] Multiple choice for longformer (#4645)
* add multiple choice for longformer

* add models to docs

* adapt docstring

* add test to longformer

* add longformer for mc in init and modeling auto

* fix tests
2020-05-29 13:46:08 +02:00
91487cbb8e [Longformer] fix model name in examples (#4653)
* fix longformer model names in examples

* a better name for the notebook
2020-05-29 13:12:35 +02:00
b5015a2a0f gpt2 typo (#4629)
* gpt2 typo

* Add files via upload
2020-05-28 16:44:43 -04:00
fe5cb1a1c8 Adding community notebook (#4642)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 22:35:15 +02:00
aecaaf73a4 [Community notebooks] add longformer-for-qa notebook (#4652) 2020-05-28 22:27:22 +02:00
5e737018e1 Fix add_special_tokens on fast tokenizers (#4531) 2020-05-28 10:54:45 -04:00
e444648a30 LongformerForTokenClassification (#4638) 2020-05-28 12:48:18 +02:00
3cc2c2a150 add 2 colab notebooks (#4505)
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-28 11:18:16 +02:00
ef03ae874f [Longformer] more models + model cards (#4628)
* adding freeze roberta models

* model cards

* lint
2020-05-28 11:11:05 +02:00
96f57c9ccb [Benchmark] Memory benchmark utils (#4198)
* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-05-27 23:22:16 +02:00
ec4cdfdd05 LongformerForSequenceClassification (#4580)
* LongformerForSequenceClassification

* better naming x=>hidden_states, fix typo in doc

* Update src/transformers/modeling_longformer.py

* Update src/transformers/modeling_longformer.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-27 22:30:00 +02:00
4402879ee4 [Model Card] model card for longformer-base-4096-finetuned-squadv1 (#4625) 2020-05-27 18:48:03 +02:00
6a17688021 per_device instead of per_gpu/error thrown when argument unknown (#4618)
* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-27 11:36:55 -04:00
1381b6d01d README for HooshvareLab (#4610)
HooshvareLab/bert-base-parsbert-uncased
2020-05-27 11:25:36 -04:00
5acb4edf25 Update version command when contributing (#4614) 2020-05-27 17:19:11 +02:00
842588c12f uncased readme (#4608)
Co-authored-by: kldarek <darekmail>
2020-05-27 09:50:04 -04:00
ac1a612179 Create README.md (#4607)
Model card for cased model
2020-05-27 09:36:20 -04:00
07797c4da4 [testing] LanguageModelGenerationTests require_tf or require_torch (#4616) 2020-05-27 09:10:26 -04:00
a9aa7456ac Add back --do_lower_case to uncased models (#4245)
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).

Results:
BERT-BASE without --do_lower_case:  'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case:  'exact': 81.02, 'f1': 88.34
2020-05-26 21:13:07 -04:00
a801c7fd74 Creating a readme for ALBERT in Mongolian (#4603)
Here I am uploading Mongolian masked language model (ALBERT) on your platform.
https://en.wikipedia.org/wiki/Mongolia
2020-05-26 16:54:42 -04:00
6458c0e268 updated model cards for both models at aubmindlab (#4604)
* updated aubmindlab/bert-base-arabert/ Model card

* updated aubmindlab/bert-base-arabertv01 model card
2020-05-26 16:52:43 -04:00
ea4e7a53fa Improve model card for Tereveni-AI/gpt2-124M-uk-fiction (#4582)
Add language metadata, training and evaluation corpora details.
Add example output. Fix inconsistent use of quotes.
2020-05-26 16:51:40 -04:00
937930dcae Create README.md (#4591) 2020-05-26 16:50:08 -04:00
bac1cc4dc1 Remove MD emojis (#4602) 2020-05-26 16:38:39 -04:00
003c477129 [GPT2, CTRL] Allow input of input_ids and past of variable length (#4581)
* revert convenience  method

* clean docs a bit
2020-05-26 19:43:58 +02:00
5ddd8d6531 Add BART fine-tuning summarization community notebook (#4539)
* adding BART summarization how-to community notebook

* Update notebooks/README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-26 16:43:41 +02:00
8cc6807e89 Make transformers-cli cross-platform (#4131)
* make transformers-cli cross-platform

Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)

* make style & quality
2020-05-26 10:00:51 -04:00
c589eae2b8 [Longformer For Question Answering] Conversion script, doc, small fixes (#4593)
* add new longformer for question answering model

* add new config as well

* fix links

* fix links part 2
2020-05-26 14:58:47 +02:00
a163c9ca5b [T5] Fix Cross Attention position bias (#4499)
* fix

* fix1
2020-05-26 08:57:24 -04:00
1d69028989 fix (#4410) 2020-05-26 08:51:28 -04:00
b86e42e0ac [ci] fix 3 remaining slow GPU failures (#4584) 2020-05-25 19:20:50 -04:00
365d452d4d [ci] Slow GPU tests run daily (#4465) 2020-05-25 17:28:02 -04:00
3e3e552125 [Reformer] fix reformer num buckets (#4564)
* fix reformer num buckets

* fix

* adapt docs

* set num buckets in config
2020-05-25 16:04:45 -04:00
3dea40b858 fixing tokenization of extra_id symbols in T5Tokenizer. Related to issue 4021 (#4353) 2020-05-25 16:04:30 -04:00
5139733623 LongformerTokenizerFast (#4547) 2020-05-25 16:03:55 -04:00
c9c385c522 Updated the link to the paper (#4570)
I looks like the conference has changed the link to the paper.
2020-05-25 15:29:50 -04:00
adab7f8332 Add nn.Module as superclass (#4533) 2020-05-25 15:29:33 -04:00
8f7c1c7672 Create model card (#4578) 2020-05-25 15:28:30 -04:00
4c6b218056 Update README.md (#4556) 2020-05-25 15:12:23 -04:00
50d1ce411f add DistilBERT to supported models (#4558) 2020-05-25 14:50:45 -04:00
03d8527de0 Longformer for question answering (#4500)
* added LongformerForQuestionAnswering

* add LongformerForQuestionAnswering

* fix import for LongformerForMaskedLM

* add LongformerForQuestionAnswering

* hardcoded sep_token_id

* compute attention_mask if not provided

* combine global_attention_mask with attention_mask when provided

* update example in  docstring

* add assert error messages, better attention combine

* add test for longformerForQuestionAnswering

* typo

* cast gloabl_attention_mask to long

* make style

* Update src/transformers/configuration_longformer.py

* Update src/transformers/configuration_longformer.py

* fix the code quality

* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-25 18:43:36 +02:00
a34a9896ac DOC: Fix typos in modeling_auto (#4534) 2020-05-23 09:40:59 -04:00
e19b978151 Add Type Hints to modeling_utils.py Closes #3911 (#3948)
* Add Type Hints to modeling_utils.py Closes #3911

Add Type Hints to methods in `modeling_utils.py`

Note: The coverage isn't 100%. Mostly skipped internal methods.

* Reformat according to `black` and `isort`

* Use typing.Iterable instead of Sequence

* Parameterize Iterable by its generic type

* Use typing.Optional when None is the default value

* Adhere to style guideline

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-22 19:10:22 -04:00
996f393a86 Warn the user about max_len being on the path to be deprecated. (#4528)
* Warn the user about max_len being on the path to be deprecated.

* Ensure better backward compatibility when max_len is provided to a tokenizer.

* Make sure to override the parameter and not the actual instance value.

* Format & quality
2020-05-22 18:08:30 -04:00
0f6969b7e9 Better github link for Reformer Colab Notebook 2020-05-22 23:51:36 +02:00
ab44630db2 [Summarization Pipeline]: Fix default tokenizer (#4506)
* Fix pipelines defaults bug

* one liner

* style
2020-05-22 17:49:45 -04:00
2c1ebb8b50 Re-apply #4446 + add packaging dependency
As discussed w/ @lysandrejik

packaging is maintained by PyPA (the Python Packaging Authority), and should be lightweight and stable
2020-05-22 17:29:03 -04:00
e6aeb0d3e8 Style 2020-05-22 17:20:03 -04:00
95a26fcf2d link to paper was broken (#4526)
changed from https://https://arxiv.org/abs/2001.04451.pdf to https://arxiv.org/abs/2001.04451.pdf
2020-05-22 15:17:09 -04:00
89d795f180 Added huseinzol05/t5-small-bahasa-cased README.md (#4522) 2020-05-22 15:04:06 -04:00
35df911485 Fix convert_token_type_ids_from_sequences for fast tokenizers (#4503) 2020-05-22 12:45:10 -04:00
f7677e1623 [model_cards] bart-large-cnn
cc @sshleifer
2020-05-22 12:20:54 -04:00
12e6afe900 Add Reformer colab to community noteboos 2020-05-22 17:03:34 +02:00
ef22ba4836 Re-pin versions 2020-05-22 11:03:07 -04:00
10d72390c0 Revert #4446 Since it introduces a new dependency 2020-05-22 10:49:45 -04:00
e0db6bbd65 Release: v2.10.0 2020-05-22 10:37:44 -04:00
bd6e301832 added functionality for electra classification head (#4257)
* added functionality for electra classification head

* unneeded dropout

* Test ELECTRA for sequence classification

* Style

Co-authored-by: Frankie <frankie@frase.io>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-22 09:48:21 -04:00
a086527727 Unused Union should not be imported 2020-05-21 09:42:47 -04:00
9d2ce253de TPU hangs when saving optimizer/scheduler (#4467)
* TPU hangs when saving optimizer/scheduler

* Style

* ParallelLoader is not a DataLoader

* Style

* Addressing @julien-c's comments
2020-05-21 09:18:27 -04:00
49296533ca Adds predict stage for glue tasks, and generate result files which can be submitted to gluebenchmark.com (#4463)
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.

* Use Split enum + always output the label name

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-21 09:17:44 -04:00
271bedb485 [examples] fix no grad in second pruning in run_bertology (#4479)
* fix no grad in second pruning and typo

* fix prune heads attention mismatch problem

* fix

* fix

* fix

* run make style

* run make style
2020-05-21 09:17:03 -04:00
865d4d595e [ci] Close #4481 2020-05-20 18:27:42 -04:00
a3af8e86cb Update test_trainer_distributed.py 2020-05-20 18:26:51 -04:00
eacea530c1 🚨 Remove warning of deprecation (#4477)
Remove warning of deprecated overload of addcdiv_

Fix #4451
2020-05-20 16:48:29 -04:00
fa2fbed3e5 Better None gradients handling in TF Trainer (#4469)
* Better None gradients handling

* Apply Style

* Apply Style
2020-05-20 16:46:21 -04:00
e708bb75bf Correct TF formatting to exclude LayerNorms from weight decay (#4448)
* Exclude LayerNorms from weight decay

* Include both formats of layer norm
2020-05-20 16:45:59 -04:00
49c06132df pass on tokenizer to pipeline (#4489) 2020-05-20 22:23:21 +02:00
cacb654c7f Add Fine-tune DialoGPT on new datasets notebook (#4473) 2020-05-20 16:17:52 -04:00
30a09f3827 Adjust german bert model card, add new model card (#4488) 2020-05-20 16:08:29 -04:00
14cb5b35fa Fix slow gpu tests lysandre (#4487)
* There is one missing key in BERT

* Correct device for CamemBERT model

* RoBERTa tokenization adding prefix space

* Style
2020-05-20 11:59:45 -04:00
6dc52c78d8 Create README.md (#4482) 2020-05-20 09:45:50 -04:00
ed5456daf4 Model card for RuPERTa-base fine-tuned for NER (#4466) 2020-05-20 09:45:24 -04:00
c76450e20c Model card for Tereveni-AI/gpt2-124M-uk-fiction (#4470)
Create model card for "Tereveni-AI/gpt2-124M-uk-fiction" model
2020-05-20 09:44:26 -04:00
9907dc523a add BERT trained from review corpus. (#4405)
* add model_cards for BERT trained on reviews.

* add link to repository.

* refine README.md for each review model
2020-05-20 09:42:35 -04:00
efbc1c5a9d [MarianTokenizer] implement save_vocabulary and other common methods (#4389) 2020-05-19 19:45:49 -04:00
956c4c4eb4 [gpu slow tests] fix mbart-large-enro gpu tests (#4472) 2020-05-19 19:45:31 -04:00
48c3a70b4e [Longformer] Docs and clean API (#4464)
* add longformer docs

* improve docs
2020-05-19 21:52:36 +02:00
aa925a52fa [Tests, GPU, SLOW] fix a bunch of GPU hardcoded tests in Pytorch (#4468)
* fix gpu slow tests in pytorch

* change model to device syntax
2020-05-19 21:35:04 +02:00
5856999a9f add T5 fine-tuning notebook [Community notebooks] (#4462)
* add T5 fine-tuning notebook [Community notebooks]

* Update README.md

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-05-19 18:26:28 +02:00
07dd7c2fd8 [cleanup] test_tokenization_common.py (#4390) 2020-05-19 10:46:55 -04:00
8f1d047148 Longformer (#4352)
* first commit

* bug fixes

* better examples

* undo padding

* remove wrong VOCAB_FILES_NAMES

* License

* make style

* make isort happy

* unit tests

* integration test

* make `black` happy by undoing `isort` changes!!

* lint

* no need for the padding value

* batch_size not bsz

* remove unused type casting

* seqlen not seq_len

* staticmethod

* `bert` selfattention instead of `n2`

* uint8 instead of bool + lints

* pad inputs_embeds using embeddings not a constant

* black

* unit test with padding

* fix unit tests

* remove redundant unit test

* upload model weights

* resolve todo

* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_

* increase unittest coverage
2020-05-19 16:04:43 +02:00
31eedff5a0 Refactored the README.md file (#4427) 2020-05-19 09:56:24 -04:00
384f0eb2f9 Map optimizer to correct device after loading from checkpoint. (#4403)
* Map optimizer to correct device after loading from checkpoint.

* Make style test pass

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 23:16:05 -04:00
bf14ef75f1 [Trainer] move model to device before setting optimizer (#4450) 2020-05-18 23:13:33 -04:00
5e7fe8b585 Distributed eval: SequentialDistributedSampler + gather all results (#4243)
* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii
2020-05-18 22:02:39 -04:00
4c06893610 Fix nn.DataParallel compatibility in PyTorch 1.5 (#4300)
* Test case for #3936

* multigpu tests pass on pytorch 1.4.0

* Fixup

* multigpu tests pass on pytorch 1.5.0

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* rename multigpu to require_multigpu

* mode doc
2020-05-18 20:34:50 -04:00
9de4afa897 Make get_last_lr in trainer backward compatible (#4446)
* makes fetching last learning late in trainer backward compatible

* split comment to multiple lines

* fixes black styling issue

* uses version to create a more explicit logic
2020-05-18 20:17:36 -04:00
42e8fbfc51 Added model cards for Romanian BERT models (#4437)
* Create README.md

* Create README.md

* Update README.md

* Update README.md

* Apply suggestions from code review

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:48:56 -04:00
54065d68b8 added model card for german-sentiment-bert (#4435) 2020-05-18 18:44:41 -04:00
e28b7e2311 Create README.md (#4433) 2020-05-18 18:41:34 -04:00
09b933f19d Update README.md (model_card) (#4424)
- add a citation.
- modify the table of the BLUE benchmark.

The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
2020-05-18 18:18:17 -04:00
235777ccc9 Modify example of usage (#4413)
I followed the google example of usage for its electra small model but i have seen it is not meaningful, so i created a better example
2020-05-18 18:17:33 -04:00
9ddd3a6548 add model card for t5-base-squad (#4409)
* add model card for t5-base-squad

* Update model_cards/valhalla/t5-base-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:17:14 -04:00
c5aa114392 Added README huseinzol05/t5-base-bahasa-cased (#4377)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny

* added electra model

* added gpt2 117m bahasa readme

* added gpt2 345m bahasa readme

* added t5-base-bahasa

* fix readme

* Update model_cards/huseinzol05/t5-base-bahasa-cased/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-18 18:10:23 -04:00
ca4a3f4da9 Adding optimizations block from ONNXRuntime. (#4431)
* Adding optimizations block from ONNXRuntime.

* Turn off external data format by default for PyTorch export.

* Correct the way use_external_format is passed through the cmdline args.
2020-05-18 20:32:33 +02:00
24538df919 [Community notebooks] General notebooks (#4441)
* Update README.md

* Update README.md

* Update README.md

* Update README.md
2020-05-18 20:23:57 +02:00
a699525d25 [test_pipelines] Mark tests > 10s @slow, small speedups (#4421) 2020-05-18 12:23:21 -04:00
d9ece8233d fix(run_language_modeling): use arg overwrite_cache (#4407) 2020-05-18 11:37:35 -04:00
d39bf0ac2d better naming in tf t5 (#4401) 2020-05-18 11:34:00 -04:00
590adb130b improve docstring (#4422) 2020-05-18 11:31:35 -04:00
026a5d0888 [T5 fp16] Fix fp16 in T5 (#4436)
* fix fp16 in t5

* make style

* refactor invert_attention_mask fn

* fix typo
2020-05-18 17:25:58 +02:00
fa6113f9a0 Fixed spelling of training (#4416) 2020-05-18 11:23:29 -04:00
757baee846 Fix un-prefixed f-string
see https://github.com/huggingface/transformers/pull/4367#discussion_r426356693

Hat/tip @girishponkiya
2020-05-18 11:20:46 -04:00
a27c795908 fix (#4419) 2020-05-18 15:51:40 +02:00
31c799a0c9 Tag onnx export tests as slow (#4432) 2020-05-18 09:24:41 -04:00
8581a670e3 [MbartTokenizer] save to sentencepiece.bpe.model (#4335) 2020-05-18 08:54:04 -04:00
18d233d525 Allow the creation of "entity groups" for NerPipeline #3548 (#3957)
* Add index to be returned by NerPipeline to allow for the creation of

* Add entity groups

* Convert entity list to dict

* Add entity to entity_group_disagg atfter updating entity gorups

* Change 'group' parameter to 'grouped_entities'

* Add unit tests for grouped NER pipeline case

* Correct variable name typo for NER_FINETUNED_MODELS

* Sync grouped tests to recent test updates
2020-05-17 09:25:17 +02:00
3e0f062106 Fix addcmul_ 2020-05-15 17:44:17 -04:00
fc2a4c88ce Fix: one more try 2020-05-15 17:38:48 -04:00
55bda52555 Same fix for addcmul_ 2020-05-15 17:23:48 -04:00
ad02c961c6 Fix UserWarning: This overload of add_ is deprecated in pytorch==1.5.0 2020-05-15 17:09:11 -04:00
15550ce0d1 [skip ci] remove local rank 2020-05-15 17:08:38 -04:00
62427d0815 rerun notebook 02-transformers (#4341) 2020-05-15 10:33:08 -04:00
34706ba050 Allow for None gradients in GradientAccumulator. (#4372) 2020-05-15 09:52:00 -04:00
edf9ac11d4 Should return overflowing information for the log (#4385) 2020-05-15 09:49:11 -04:00
b908f2e9dd Attempt to unpin torch version for Github Action. (#4384) 2020-05-15 15:47:15 +02:00
af2e6bf87c [examples] Streamline doc 2020-05-14 20:34:31 -04:00
7defc6670f p_mask in SQuAD pre-processing (#4049)
* Better p_mask building

* Adressing @mfuntowicz comments
2020-05-14 17:07:52 -04:00
84894974bd Updated ONNX notebook link in README. 2020-05-14 22:40:59 +02:00
db0076a9df Conversion script to export transformers models to ONNX IR. (#4253)
* Added generic ONNX conversion script for PyTorch model.

* WIP initial TF support.

* TensorFlow/Keras ONNX export working.

* Print framework version info

* Add possibility to check the model is correctly loading on ONNX runtime.

* Remove quantization option.

* Specify ONNX opset version when exporting.

* Formatting.

* Remove unused imports.

* Make functions more generally reusable from other part of the code.

* isort happy.

* flake happy

* Export only feature-extraction for now

* Correctly check inputs order / filter before export.

* Removed task variable

* Fix invalid args call in load_graph_from_args.

* Fix invalid args call in convert.

* Fix invalid args call in infer_shapes.

* Raise exception and catch in caller function instead of exit.

* Add 04-onnx-export.ipynb notebook

* More WIP on the notebook

* Remove unused imports

* Simplify & remove unused constants.

* Export with constant_folding in PyTorch

* Let's try to put function args in the right order this time ...

* Disable external_data_format temporary

* ONNX notebook draft ready.

* Updated notebooks charts + wording

* Correct error while exporting last chart in notebook.

* Adressing @LysandreJik comment.

* Set ONNX opset to 11 as default value.

* Set opset param mandatory

* Added ONNX export unittests

* Quality.

* flake8 happy

* Add keras2onnx dependency on extras["tf"]

* Pin keras2onnx on github master to v1.6.5

* Second attempt.

* Third attempt.

* Use the right repo URL this time ...

* Do the same for onnxconverter-common

* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2

* Correct commit hash.

* Addressing PR review: Optimization are enabled by default.

* Addressing PR review: small changes in the notebook

* setup.py comment about keras2onnx versioning.
2020-05-14 16:35:52 -04:00
2d05480174 Fix trainer evaluation (#4363)
* fix loss calculation in evaluation

* fix evaluation on TPU when prediction_loss_only is True
2020-05-14 14:39:44 -04:00
035678efdb Create README.md (#4359)
* Create README.md

* Update model_cards/savasy/bert-base-turkish-squad/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-14 14:07:32 -04:00
b9c9e05381 Create README.md (#4357) 2020-05-14 14:06:10 -04:00
9535bf1977 Tokenizer.batch_decode convenience method (#4159) 2020-05-14 13:50:47 -04:00
7822cd38a0 [tests] make pipelines tests faster with smaller models (#4238)
covers torch and tf. Also fixes a failing @slow test
2020-05-14 13:36:02 -04:00
448c467256 Fix: unpin flake8 and fix cs errors (#4367)
* Fix: unpin flake8 and fix cs errors

* Ok we still need to quote those
2020-05-14 13:14:26 -04:00
c547f15a17 Use Filelock to ensure distributed barriers
see context in https://github.com/huggingface/transformers/pull/4223
2020-05-14 11:58:32 -04:00
015f7812ed [ci skip] Pin isort 2020-05-14 10:12:18 -04:00
ef46ccb05c TPU needs a rendezvous (#4339) 2020-05-14 08:59:52 -04:00
94cb73c2d2 Add image and metadata (#4345)
Unfortunately i accidentally orphaned my other PR
2020-05-13 20:05:15 -04:00
a0eebdc404 Add link to W&B to see whole training logs (#4348) 2020-05-13 20:04:57 -04:00
7cb203fae4 Release: v2.9.1 2020-05-13 17:38:50 -04:00
9a687ebb77 [Marian Fixes] prevent predicting pad_token_id before softmax, support language codes, name multilingual models (#4290) 2020-05-13 17:29:41 -04:00
839bfaedb2 [Docs, Notebook] Include generation pipeline (#4295)
* add first text for generation

* add generation pipeline to usage

* Created using Colaboratory

* correct docstring

* finish
2020-05-13 14:24:08 -04:00
2d184cb553 wrong variable name used (#4328) 2020-05-13 10:22:03 -04:00
ca13618681 Question Answering for TF trainer (#4320)
* Add QA trainer example for TF

* Make data_dir optional

* Fix parameter logic

* Fix feature convert

* Update the READMEs to add the question-answering task

* Apply style

* Change 'sequence-classification' to 'text-classification' and prefix with 'eval' all the metric names

* Apply style

* Apply style
2020-05-13 09:22:31 -04:00
1e51bb717c Fix for #3865. PretrainedTokenizer mapped " do not" into " don't" when .decode(...) is called. Removed the " do not" --> " don't" mapping from clean_up_tokenization(...). (#4024) 2020-05-13 14:32:57 +02:00
241759101e (v2) Improvements to the wandb integration (#4324)
* Improvements to the wandb integration

* small reorg + no global necessary

* feat(trainer): log epoch and final metrics

* Simplify logging a bit

* Fixup

* Fix crash when just running eval

Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
2020-05-12 21:52:01 -04:00
7d7fe4997f Allow BatchEncoding to be initialized empty. (#4316)
* Allow BatchEncoding to be initialized empty.

This is required by recent changes introduced in TF 2.2.

* Attempt to unpin Tensorflow to 2.2 with the previous commit.
2020-05-12 15:02:46 -04:00
0a97f6312a Update README.md (#4313) 2020-05-12 15:01:45 -04:00
15a121fec5 Update README.md (#4315) 2020-05-12 15:01:34 -04:00
15d45211f7 [model_cards]: 🇹🇷 Add new ELECTRA small and base models for Turkish (#4318) 2020-05-12 15:01:17 -04:00
8a017cbb5a Add modelcard with acknowledgements (#4321) 2020-05-12 15:00:56 -04:00
4bf5042240 Fix BART tests on GPU (#4298) 2020-05-12 09:11:50 -04:00
e4512aab3b Add MultipleChoice to TFTrainer [WIP] (#4270)
* catch gpu len 1 set to gpu0

* Add mpc to trainer

* Add MPC for TF

* fix TF automodel for MPC and add Albert

* Apply style

* Fix import

* Note to self: double check

* Make shape None, None for datasetgenerator output shapes

* Add from_pt bool which doesnt seem to work

* Original checkpoint dir

* Fix docstrings for automodel

* Update readme and apply style

* Colab should probably not be from users

* Colabs should probably not be from users

* Add colab

* Update README.md

* Update README.md

* Cleanup __intit__

* Cleanup flake8 trailing comma

* Update src/transformers/training_args_tf.py

* Update src/transformers/modeling_tf_auto.py

Co-authored-by: Viktor Alm <viktoralm@pop-os.localdomain>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-12 08:48:48 -04:00
65be574aec fixed missing torch module import (#4305)
fixed missing torch module import in example usage code
2020-05-12 08:34:17 -04:00
31e67dd19f Remove hard-coded pad token id in distilbert and albert (#3965) 2020-05-12 08:32:44 -04:00
30e343862f pin TF to 2.1 (#4297)
* pin TF to 2.1

* Pin flake8 as well
2020-05-11 21:03:30 -04:00
56e8ef632f [ci] Restrict GPU tests to actual code commits 2020-05-11 20:40:41 -04:00
ba6f6e44a8 [ci] Re-enable torch GPU tests 2020-05-12 00:05:36 +00:00
9524956819 Documentation specification (#4294) 2020-05-11 16:43:57 -04:00
61d22f9cc7 Simplify cache vars and allow for TRANSFORMERS_CACHE env (#4226)
* simplify cache vars and allow for TRANSFORMERS_CACHE env

As it currently stands, "TRANSFORMERS_CACHE" is not an accepted variable. It seems that the these variables were not updated when moving from version pytorch_transformers to transformers. In addition, the fallback procedure could be improved. and simplified. Pathlib seems redundant here.

* Update file_utils.py
2020-05-11 15:24:02 -04:00
cd40cb8879 Fix special token doc (#4292) 2020-05-11 15:05:36 -04:00
82601f4c1a Allow gpt2 to be exported to valid ONNX (#4244)
* allow gpt2 to be exported to valid ONNX model

* cast size from int to float explictly
2020-05-11 14:55:55 -04:00
39994051e4 Add migrating from pytorch-transformers (#4273)
"Migrating from pytorch-transformers to transformers" is missing in the main document. It is available in the main `readme` thought. Just move it to the document.
2020-05-11 13:35:13 -04:00
051dcb2a07 CamemBERT does not make use of Token Type IDs (#4289) 2020-05-11 13:31:03 -04:00
41e8291217 Add ALBERT to the Tensorflow to Pytorch model conversion cli (#3933)
* Add ALBERT to convert command of transformers-cli

* Document ALBERT tf to pytorch model conversion
2020-05-11 13:10:00 -04:00
3f42eb979f Documentation: fix links to NER examples (#4279)
* docs: fix link to token classification (NER) example

* examples: fix links to NER scripts
2020-05-11 12:48:21 -04:00
8fdb7997c6 Align sentiment-analysis' tokenizer (currently uncased) to the model (uncased). (#4264) 2020-05-11 12:45:53 -04:00
4658896ee1 [Marian] Fix typo in docstring (#4284) 2020-05-11 11:47:51 -04:00
bf64b8cf09 Model card for bert-turkish-question-answering question-answering model (#4281)
* Create README.md

* Update model_cards/lserinol/bert-turkish-question-answering/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-11 11:32:25 -04:00
94b57bf796 [TF 2.2 compat] use tf.VariableAggregation.ONLY_FIRST_REPLICA (#4283)
* Fix the issue to properly run the accumulator with TF 2.2

* Apply style

* Fix training_args_tf for TF 2.2

* Fix the TF training args when only one GPU is available

* Remove the fixed version of TF in setup.py
2020-05-11 11:28:37 -04:00
cffbb3d8ed Update README.md (#4276) 2020-05-11 11:24:41 -04:00
5f50d619dd Fix XTREME link + add number of eval documents + fix usage code (#4280) 2020-05-11 11:24:10 -04:00
7751be7cee fix reformer apex scaling issue (#4242) 2020-05-11 16:53:42 +02:00
ac7d5f67a2 [Reformer] Add Enwiki8 Reformer Model - Adapt convert script (#4282)
* adapt convert script

* update convert script

* finish

* fix marian pretrained docs
2020-05-11 16:38:07 +02:00
336116d960 Reformer enwik8 - Model card (#4286) 2020-05-11 16:22:08 +02:00
b290c32e16 [docs] fix typo (#4249) 2020-05-10 14:07:08 -04:00
3487be75ef [Marian] documentation and AutoModel support (#4152)
- MarianSentencepieceTokenizer - > MarianTokenizer
- Start using unk token.
- add docs page
- add better generation params to MarianConfig
- more conversion utilities
2020-05-10 13:54:57 -04:00
9d2f467bfb [README] Corrected some grammatical mistakes (#4199) 2020-05-10 09:02:36 -04:00
7b75aa9fa5 [TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223)
* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once

* Update examples/README.md

* [xla_spawn] Add `_mp_fn` to other Trainer scripts

* [TPU] Fix: eval dataloader was None
2020-05-08 14:10:05 -04:00
274d850d34 Fix #4098 2020-05-08 12:39:46 -04:00
26dad0a9fa example updated to use generation pipeline (#4230)
* example updated to use generation pipeline

* Update model_cards/LorenzoDeMattei/GePpeTto/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-08 09:45:10 -04:00
9ebb5b2a54 Model card for allegro/herbert-klej-cased-tokenizer-v1 (#4184) 2020-05-08 09:42:43 -04:00
9e54efd004 Model card for allegro/herbert-klej-cased-v1 (#4183) 2020-05-08 09:42:28 -04:00
a8b798e6c4 Model card for spanish electra small (#4196) 2020-05-08 09:30:15 -04:00
242005d762 Create README.md (#4132)
* Create README.md

* Adding code fence around code block
2020-05-08 09:27:29 -04:00
5940c73bbb Create README.md (#4179)
model card for my De Novo Drug discovery model using MLM
2020-05-08 09:25:36 -04:00
cf08830c28 [Pipeline, Generation] tf generation pipeline bug (#4217)
* fix PR

* move tests to correct place
2020-05-08 08:30:05 -04:00
8bf7312654 Add AlbertForPreTraining and TFAlbertForPreTraining models. (#4057)
* Add AlbertForPreTraining and TFAlbertForPreTraining models.

* PyTorch conversion

* TensorFlow conversion

* style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-05-07 19:44:51 -04:00
c99fe0386b [doc] Fix broken links + remove crazy big notebook 2020-05-07 18:44:18 -04:00
66113bd626 Create README.md (#4202) 2020-05-07 18:31:22 -04:00
6669915b65 [examples] Add column for pytorch-lightning support 2020-05-07 15:26:58 -04:00
612fa1b10b Examples readme.md (#4215)
* README

* Update README.md
2020-05-07 15:00:06 -04:00
2e57824374 Pin isort and tf <= 2.1.0 2020-05-07 14:42:00 -04:00
e7cfc1a313 Release: v2.9.0 2020-05-07 14:15:20 -04:00
0ae96ff8a7 BIG Reorganize examples (#4213)
* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around
2020-05-07 13:48:44 -04:00
cafa6a9e29 [Trainer] Ability to specify optimizer/scheduler at init
cc @patrickvonplaten @thomwolf
2020-05-07 11:25:26 -04:00
e4fd5e3999 Use with_extension to change the extension (#4203)
As per https://github.com/huggingface/transformers/pull/3934#discussion_r421307659
2020-05-07 11:14:56 -04:00
ebf80e2e70 Tpu trainer (#4146)
* wip

* wip

* a last wip

* Better logging when using TPUs

* Correct argument name

* Tests

* fix

* Metrics in evaluation

* Update src/transformers/training_args.py

* [tpu] Use launcher script instead

* [tpu] lots of tweaks

* Fix formatting

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-07 10:34:04 -04:00
026097b9ee Ensure fast tokenizer can construct tensor without pad token if only one sample is provided. (#4201) 2020-05-07 10:02:53 -04:00
0a6cbea0a5 Rewritten batch support in pipelines. (#4154)
* Rewritten batch support in pipelines.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix imports sorting 🔧

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Set pad_to_max_length=True by default on Pipeline.

* Set pad_to_max_length=False for generation pipelines.

Most of generation models doesn't have padding token.

* Address @joeddav review comment: Uniformized *args.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @joeddav review comment: Uniformized *args (second).

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-05-07 09:52:40 -04:00
99d1a69444 fix examples (#4192) 2020-05-07 10:54:48 +02:00
74ffc9ea6b [Reformer] Fix example and error message (#4191)
* fix example reformer

* fix error message and example docstring

* improved error message
2020-05-07 10:50:11 +02:00
96c78396ce fix docstring reformer (#4190) 2020-05-07 10:28:31 +02:00
dca34695d0 Reformer (#3351)
* first copy & past commit from Bert and morgans LSH code

* add easy way to compare to trax original code

* translate most of function

* make trax lsh self attention deterministic with numpy seed + copy paste code

* add same config

* add same config

* make layer init work

* implemented hash_vectors function for lsh attention

* continue reformer translation

* hf LSHSelfAttentionLayer gives same output as trax layer

* refactor code

* refactor code

* refactor code

* refactor

* refactor + add reformer config

* delete bogus file

* split reformer attention layer into two layers

* save intermediate step

* save intermediate step

* make test work

* add complete reformer block layer

* finish reformer layer

* implement causal and self mask

* clean reformer test and refactor code

* fix merge conflicts

* fix merge conflicts

* update init

* fix device for GPU

* fix chunk length init for tests

* include morgans optimization

* improve memory a bit

* improve comment

* factorize num_buckets

* better testing parameters

* make whole model work

* make lm model work

* add t5 copy paste tokenizer

* add chunking feed forward

* clean config

* add improved assert statements

* make tokenizer work

* improve test

* correct typo

* extend config

* add complexer test

* add new axial position embeddings

* add local block attention layer

* clean tests

* refactor

* better testing

* save intermediate progress

* clean test file

* make shorter input length work for model

* allow variable input length

* refactor

* make forward pass for pretrained model work

* add generation possibility

* finish dropout and init

* make style

* refactor

* add first version of RevNet Layers

* make forward pass work and add convert file

* make uploaded model forward pass work

* make uploaded model forward pass work

* refactor code

* add namedtuples and cache buckets

* correct head masks

* refactor

* made reformer more flexible

* make style

* remove set max length

* add attention masks

* fix up tests

* fix lsh attention mask

* make random seed optional for the moment

* improve memory in reformer

* add tests

* make style

* make sure masks work correctly

* detach gradients

* save intermediate

* correct backprob through gather

* make style

* change back num hashes

* rename to labels

* fix rotation shape

* fix detach

* update

* fix trainer

* fix backward dropout

* make reformer more flexible

* fix conflict

* fix

* fix

* add tests for fixed seed in reformer layer

* fix trainer typo

* fix typo in activations

* add fp16 tests

* add fp16 training

* support fp16

* correct gradient bug in reformer

* add fast gelu

* re-add dropout for embedding dropout

* better naming

* better naming

* renaming

* finalize test branch

* finalize tests

* add more tests

* finish tests

* fix

* fix type trainer

* fix fp16 tests

* fix tests

* fix tests

* fix tests

* fix issue with dropout

* fix dropout seeds

* correct random seed on gpu

* finalize random seed for dropout

* finalize random seed for dropout

* remove duplicate line

* correct half precision bug

* make style

* refactor

* refactor

* docstring

* remove sinusoidal position encodings for reformer

* move chunking to modeling_utils

* make style

* clean config

* make style

* fix tests

* fix auto tests

* pretrained models

* fix docstring

* update conversion file

* Update pretrained_models.rst

* fix rst

* fix rst

* update copyright

* fix test path

* fix test path

* fix small issue in test

* include reformer in generation tests

* add docs for axial position encoding

* finish docs

* Update convert_reformer_trax_checkpoint_to_pytorch.py

* remove isort

* include sams comments

* remove wrong comment in utils

* correct typos

* fix typo

* Update reformer.rst

* applied morgans optimization

* make style

* make gpu compatible

* remove bogus file

* big test refactor

* add example for chunking

* fix typo

* add to README
2020-05-07 10:17:01 +02:00
877fc56410 change order pytorch/tf in readme (#4167) 2020-05-06 16:31:07 -04:00
aad50151f3 TF version of the trainer (#4017)
* First commit to add a TF version of the trainer.

* Make the TF trainer closer to what looks the PT trainer

* Refactoring common code between the PT and TF trainer into an util file.

* Some bugfix + better similarity with the PT trainer

* Add missing class in transformers init

* Bugfix over prediction + use classification report instead of simple metrics

* Fix name error

* Fix optimization tests + style

* Apply style

* Several bugfix for multi-gpu training

* Apply style

* Apply style

* Add glue example for the TF trainer

* Several bugix + address the reviews

* Fix on the TF training args file

* Add a debug mode

* Bugfix in utils_ner.py when segment_ids is None

* Apply style

* Apply style

* Add TPU strategy

* Fix selection strategy
2020-05-06 12:56:52 -04:00
25296b12aa Fix overwrite_cache behaviour for pytorch lightning examples (#4093) 2020-05-06 12:24:49 -04:00
9972562d33 Include ElectraPreTrainedModel into __init__ (#4173) 2020-05-06 12:00:23 -04:00
ff8ed52dd8 Camembert-large-fquad model card (#4143)
Description for the model card describing the camembert-large-fquad model.
2020-05-06 10:41:07 -04:00
4c3be2e718 Add model card for the NER model (#4162) 2020-05-06 10:40:55 -04:00
17ae0363db Fix markdown to show the results table properly (#4119) 2020-05-06 10:38:29 -04:00
a638e986f4 fix hard wired pad token id (#4138) 2020-05-06 00:42:34 +02:00
fd2174664c [Trainer] W&B: Enable model watch
See https://github.com/huggingface/transformers/pull/3916
2020-05-05 10:59:23 -04:00
79b1c6966b Pytorch 1.5.0 (#3973)
* Standard deviation can no longer be set to 0

* Remove torch pinned version

* 9th instead of 10th, silly me
2020-05-05 10:23:01 -04:00
818463ee8e Trainer: add logging through Weights & Biases (#3916)
* feat: add logging through Weights & Biases

* feat(wandb): make logging compatible with all scripts

* style(trainer.py): fix formatting

* [Trainer] Tweak wandb integration

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-04 22:42:27 -04:00
858b1d1e5a allow an already created tensorboard SummaryWriter be passed to Trainer 2020-05-04 19:58:24 -04:00
8e67573a64 [EncoderDecoder Tests] Improve tests (#4046)
* Hoist bert model tester for patric

* indent

* make tests work

* Update tests/test_modeling_bert.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>

Co-authored-by: sshleifer <sshleifer@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-04 02:18:36 +02:00
6af3306a1d Add decoder specific error message for T5Stack.forward (#4128) 2020-05-03 12:40:08 +02:00
1cdd2ad2af Fix #2941 (#4109)
* Fix of issue #2941

Reshaped score array to avoid `numpy` ValueError.

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:20:30 -04:00
5f4f6b65b3 distilroberta-base-finetuned-sentiment (#4115)
* Create model card

Create Model card for distilroberta-base-finetuned-sentiment

* Update model_cards/mrm8488/distilroberta-base-finetuned-sentiment/README.md

* Update model_cards/mrm8488/distilroberta-base-finetuned-sentiment/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:19:31 -04:00
7da051f135 model card for surajp/albert-base-sanskrit (#4114)
* Create README.md

* Update model_cards/surajp/albert-base-sanskrit/README.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-02 11:15:39 -04:00
14911e2e12 Create README.md (#4112) 2020-05-02 10:52:12 -04:00
9e97c87539 Added huseinzol05/gpt2-345M-bahasa-cased (#4102) 2020-05-02 10:51:15 -04:00
4c5bd92183 Update run_pl_glue.py (#4117) 2020-05-02 10:38:30 -04:00
5282b31df4 Update run_pl_ner.py (#4118) 2020-05-02 10:38:21 -04:00
1e616c0af3 NER: parse args from .args file or JSON (#4110)
* ner: parse args from .args file or JSON

* examples: mention json-based configuration file support for run_ner script
2020-05-02 10:29:17 -04:00
abb1fa3f37 Update README.md 2020-05-02 10:32:00 +02:00
0ccbfd2868 Update Reformer ReadME 2020-05-02 10:31:00 +02:00
2d8340a91f [Reformer] Move model card to google model (#4113)
* correct model card

* remove model card from patrick von platen
2020-05-02 10:25:22 +02:00
d713cfc5eb GePpeTto 🇮🇹: Fixpath to model card 2020-05-01 11:48:58 -04:00
f3d44301cc GePpeTto model 🇮🇹 (#4099)
* Create GePpeTto.md

* Update model_cards/LorenzoDeMattei/GePpeTto.md

* Update model_cards/LorenzoDeMattei/GePpeTto.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-05-01 11:46:42 -04:00
27d55125e6 Configs: saner num_labels in configs. (#3967) 2020-05-01 11:28:55 -04:00
e80be7f1d0 docs: add xlm-roberta section to multi-lingual section (#4101) 2020-05-01 11:06:58 -04:00
18db92dd9a [testing] add timeout_decorator (#3543) 2020-05-01 09:05:47 -04:00
b8686174be Merge pull request #3934 from huggingface/examples_args_from_files
[qol] example scripts: parse args from .args file or JSON
2020-04-30 22:40:13 -04:00
f39217a5ec [tests] Light cleanup of tempfile in tests/ 2020-04-30 22:30:15 -04:00
f54dc3f4d5 [ci] Load pretrained models into the default (long-lived) cache
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models

When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.

I'd rather always use the default cache
2020-04-30 22:30:15 -04:00
6b410bedfc Model Card: gaochangkuan README.md (#4033)
* Create README.md

* Update README.md

* tweak

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-30 22:26:58 -04:00
8829ace4aa added gpt2 117m bahasa readme
(cherry picked from commit a4a673a1d0bec0bf4085eef021acb788ca1f5eb5)
2020-04-30 22:20:00 -04:00
1851a64b6f create model_card camembert-base-wikipedia-4gb 2020-04-30 22:16:12 -04:00
443e5e34af Create README.md 2020-04-30 22:16:00 -04:00
60e1556a44 Create model_card camembert-base-ccnet-4gb 2020-04-30 22:15:47 -04:00
fa9365eca5 Create README.md 2020-04-30 22:15:38 -04:00
afe002b04c Create README.md 2020-04-30 22:15:23 -04:00
8b5e5ebcf9 Continue training args and tqdm in notebooks (#3939)
* Continue training args

* Continue training args

* added explaination

* added explaination

* added explaination

* Fixed tqdm auto

* Update src/transformers/training_args.py

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* Update src/transformers/training_args.py

* Update src/transformers/training_args.py

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-30 22:14:08 -04:00
ab90353f1a [cli] {login, upload, s3} display more helpful error messages 2020-04-30 12:51:06 -04:00
452dd0e4d9 [ci] Align test_hf_api.py with API change 2020-04-30 12:06:01 -04:00
7f9193ef09 Fixed Style Inconsistency (#3976) 2020-04-30 14:33:09 +02:00
64070cbb88 Fix TF input docstrings to refer to tf.Tensor rather than torch.FloatTensor. (#4051) 2020-04-30 14:28:56 +02:00
e73595bd64 Remove jitted method so that our models are pickable. (#4050) 2020-04-29 09:53:19 -04:00
2c77842887 [Fix common tests on GPU] send model, ids to torch_device (#4014) 2020-04-29 09:47:20 -04:00
6faca88ee0 Align MarianMT with #4030
cc @sshleifer
2020-04-28 20:35:20 -04:00
211e130811 [github] Issue templates: populate some labels
cc @bramvanroy @stefan-it
2020-04-28 20:34:34 -04:00
455c639093 CDN urls (#4030)
* [file_utils] use_cdn + documentation

* Move to cdn. urls for weights

* [urls] Hotfix for bert-base-japanese
2020-04-28 20:27:14 -04:00
8ba4c5885f Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair (#3994)
* Allow a more backward compatible behavior of max_len_single_sentence and max_len_sentences_pair and

* The style and quality are now top-notch
2020-04-29 01:13:59 +02:00
847e7f3379 MarianMTModel.from_pretrained('Helsinki-NLP/opus-marian-en-de') (#3908)
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
2020-04-28 18:22:37 -04:00
d714dfeaa8 [isort] add known 3rd party to setup.cfg (#4053)
* add known 3rd party to setup.cfg

* comment

* Update CONTRIBUTING.md

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-28 17:12:00 -04:00
d52b0e294a Minor Readme Fixes (#4056)
Added contact info and fixed typos.
2020-04-28 16:42:15 -04:00
55adefe428 Add license information to model cards (#3864)
Close #3357
2020-04-28 16:40:21 -04:00
0ac6d0bf33 Create README.md
I create japanese binary classification.
2020-04-28 15:35:30 -04:00
c73c83b0e6 Small cosmetic changes to CamemBERT model card 2020-04-28 15:32:55 -04:00
4a94c062a4 Provide model card for roberta-base-squad2-covid 2020-04-28 15:29:30 -04:00
c7d06b79ae Fix #3954 - GPT2 is not traceable (#3955)
* Update sqrt computation so it can survive a torch.jit.trace

* Update modeling_gpt2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2020-04-28 21:18:56 +02:00
9a0a8c1c6f add examples to doc (#4045) 2020-04-28 16:33:23 +02:00
fa49b9afea Clean Encoder-Decoder models with Bart/T5-like API and add generate possibility (#3383)
* change encoder decoder style to bart & t5 style

* make encoder decoder generation dummy work for bert

* make style

* clean init config in encoder decoder

* add tests for encoder decoder models

* refactor and add last tests

* refactor and add last tests

* fix attn masks for bert encoder decoder

* make style

* refactor prepare inputs for Bert

* refactor

* finish encoder decoder

* correct typo

* add docstring to config

* finish

* add tests

* better naming

* make style

* fix flake8

* clean docstring

* make style

* rename
2020-04-28 15:11:09 +02:00
180585741c [Generation] Generation should allow to start with empty prompt (#3993)
* fix empty prompt

* fix length in generation pipeline
2020-04-28 14:33:15 +02:00
52679fbc2e add dialogpt training tips (#3996) 2020-04-28 14:32:31 +02:00
b5c6d3d4c7 notebooks: minor fix for community provided models example (#4025) 2020-04-28 09:12:25 +02:00
2fade302ac camembert-base-fquad
Model card for illuin release of camembert-base-fquad
2020-04-27 18:29:55 -04:00
20c3b8cab4 Create model card 2020-04-27 18:27:46 -04:00
b3f272ffcb Create model card 2020-04-27 18:27:04 -04:00
518f291eef add model card for Hindi-BERT 2020-04-27 18:25:16 -04:00
d7b3bf547c Model cards for KoELECTRA 2020-04-27 18:21:01 -04:00
db9d56c08a Add modelcard for Hate-speech-CNERG/dehatebert-mono-arabic model (#3979)
* Add dehatebert-mono-arabic readme card

* Update dehatebert-mono-arabic model card
2020-04-27 18:18:54 -04:00
41750a6cff Fix typos 2020-04-27 13:25:53 -04:00
12bb7fe770 Fix t5 doc typos (#3978)
* Fix tpo in into and add line under

* Add missing blank line under

* Correct types under
2020-04-27 18:27:15 +02:00
97a375484c rm boto3 dependency 2020-04-27 11:17:14 -04:00
4e817ff418 Create README.md (#3966) 2020-04-25 09:16:40 -04:00
73d6a2f901 [model_cards] xlnet_chinese_large & roberta_chinese_large 2020-04-24 16:12:42 -04:00
623ba0236d Create README.md (#3882) 2020-04-24 15:57:01 -04:00
f4078e0db6 Feat/add model card (#3923)
* add model card for gpt2-imdb-ctrl

* fix title

* add sentiment control description
2020-04-24 10:24:28 -04:00
03322b4261 Create README.md (#3917) 2020-04-24 10:24:00 -04:00
c811526004 [examples] For convenience, also save the tokenizer
Close #3921
2020-04-24 09:52:42 -04:00
b0167632ce Shuffle train subset for summarization example (#3909)
* Shuffle train subset

* Cleaner shuffle
2020-04-24 07:55:34 -04:00
c53cc018de [Trainer] Fix _rotate_checkpoints
Close #3920
2020-04-23 23:59:43 +00:00
cbbb3c43c5 [hubconf] Modify pythonpath to get canonical imports to work
See https://github.com/huggingface/transformers/pull/3881/files#r412292660

Should we remove SRC_DIR from sys.path right after the imports, @aaugustin?
2020-04-23 16:27:43 -04:00
77b75d2c78 Fix for #3873 to change type of exponent parameter for torch.pow() call from int to float (#3924) 2020-04-23 14:25:31 -04:00
6ba254ee54 quick fix wording readme for community models (#3900) 2020-04-23 14:19:45 -04:00
a79a9e1241 Fix TFAlbertForSequenceClassification classifier dropout probability. It was set to config.hidden_dropout_prob, but should be config.classifier_dropout_prob. (#3928) 2020-04-23 13:18:16 -04:00
8e093e5981 Remove 50k limits bug 2020-04-23 11:15:09 -04:00
6af5a54c28 [Trainer] reuse constant 2020-04-23 11:02:05 -04:00
7c2a32ff88 [housekeeping] super() 2020-04-23 10:43:22 -04:00
a946b6b51b [housekeeping] Upgrade # type Python 2 syntax
cc @sshleifer
2020-04-23 10:39:24 -04:00
cb3c2212c7 Create model card (#3890)
Model: TinyBERT-spanish-uncased-finetuned-ner
2020-04-22 14:56:43 -04:00
d698b87f20 Update comparison table (#3889) 2020-04-22 14:54:17 -04:00
13dd2acca4 Bump tokenizers version to final 0.7.0 (#3898) 2020-04-22 11:02:29 -04:00
f16540fcba Pipeline for Text Generation: GenerationPipeline (#3758)
* Add GenerationPipeline

* Fix parameter names

* Correct parameter __call__ parameters

* Add model type attribute and correct function calls for prepare_input

* Take out trailing commas from init attributes

* Remove unnecessary tokenization line

* Implement support for multiple text inputs

* Apply generation support for multiple input text prompts

* Take out tensor coersion

* Take out batch index

* Add text prompt to return sequence

* Squeeze token tensore before decoding

* Return only a single list of sequences if only one prompt was used

* Correct results variable name

* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2

* Registedred AutoModelWithLMHead for both pt and t

* Update docstring for GenerationPipeline

* Add kwargs parameter to mode.generate

* Take out kwargs parameter after all

* Add generation pipeline example in pipeline docstring

* Fix max length by squeezing tokens tensor

* Apply ensure_tensor_on_device to pytorch tensor

* Include generation step in torch.no_grad

* Take out input from prepare_xlm_input and set 'en' as default xlm_language

* Apply framework specific encoding during prepare_input

* Format w make style

* Move GenerationPipeline import to follow proper import sorting

* Take out training comma from generation dict

* Apply requested changes

* Change name to TextGenerationPipeline

* Apply TextGenerationPipeline rename to __init___

* Changing alias to

* Set input mapping as input to ensure_tensor_on_device

* Fix assertion placement

* Add test_text_generation

* Add TextGenerationPipeline to PipelineCommonTests

* Take out whitespace

* Format __init__ w black

* Fix __init__ style

* Forman __init___

* Add line to end of __init__

* Correct model tokenizer set for test_text_generation

* Ensure to return list of list, not list of string (to pass test)

* Limit test models to only 3 to limit runtime to address circleCI timeout error

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict

* Fix blank result list

* Add TextGenerationPipeline to pipelines.rst

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH

* Fix incorrectly moved result list

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Add back generation line and make style

* Take out blank whitespace

* Apply new alis, text-generation, to test_pipelines

* Fix text generation alias in test

* Update src/transformers/pipelines.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-22 09:37:03 -04:00
1dc9b3c784 Fixes #3877 2020-04-22 01:15:10 +00:00
dd9d483d03 Trainer (#3800)
* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6ef071029cfe82838a18dab046b5813976
2020-04-21 20:11:56 -04:00
eb5601b0a5 [ci] Pin torch version while we update 2020-04-21 15:46:18 -04:00
53f5ef6df5 create readme for spentaur/yelp model (#3874)
* create readme for spentaur/yelp model

* update spentaur/yelp/README.md

* remove typo
2020-04-21 15:31:36 -04:00
d32585a304 Fix Torch.hub + Integration test 2020-04-21 14:13:30 -04:00
7d40901ce3 Fix Documentation issue in BertForMaskedLM forward (#3855) 2020-04-21 09:08:20 +02:00
b1ff0b2ae7 Fix bug in examples: double wrap into DataParallel during eval 2020-04-20 19:37:44 -04:00
7f23af1684 added electra model
(cherry picked from commit b5f2dc5d627d44b8cbb0ccf8ad2b46bea211a236)
2020-04-20 17:17:58 -04:00
03121deba3 New model added
The first model added to the repo
2020-04-20 17:10:01 -04:00
15b9868f8b Create model card 2020-04-20 17:07:34 -04:00
2c05b8a56c Remove tqdm logging when using pipelines. (#3833)
Introduce tqdm_enabled parameter on squad_convert_examples_to_features() default to True and set to False in QA pipelines.
2020-04-20 22:58:52 +02:00
c79b550dd0 Add qas_id to SquadResult and SquadExample (#3745)
* Add qas_id

* Fix incorrect name in squad.py

* Make output files optional for squad eval
2020-04-20 16:08:57 -04:00
c4158a6314 [Pipelines] Encode to max length of input not max length of tokenizer for batch input (#3857)
* remove max_length = tokenizer.max_length when encoding

* make style
2020-04-20 14:39:16 -04:00
857ccdb259 exbert links for my albert model cards (#3729)
* exbert links for my albert model cards

* Added exbert tag to the metadata block

* Adding "how to cite"
2020-04-20 10:54:39 -04:00
a504cb49ec [examples] fix summarization do_predict (#3866) 2020-04-20 10:49:56 -04:00
52c85f847a Update README.md 2020-04-20 10:10:56 -04:00
a21d4fa410 add "by" to ReadMe 2020-04-18 18:07:17 +02:00
827d6d6ef0 Cleanup fast tokenizers integration (#3706)
* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>
2020-04-18 13:43:57 +02:00
60a42ef1c0 [model_cards] Fix CamemBERT table markdown
see https://github.com/huggingface/transformers/pull/3836
2020-04-17 20:21:15 -04:00
88aecee6a2 [ci] GitHub-hosted runner has no space left on device 2020-04-17 20:16:00 -04:00
73efa694e6 Update camembert-base-README.md (#3836) 2020-04-17 20:08:13 -04:00
e9d0bc027a [Config, Serialization] more readable config serialization (#3797)
* better config serialization

* finish configuration utils
2020-04-17 20:07:18 -04:00
8b63a01d95 XLM tokenizer should encode with bos token (#3791)
* XLM tokenizer should encode with bos token

* Update tests
2020-04-17 11:28:55 -04:00
1d4a35b396 Higher tolerance for past testing in TF T5 (#3844) 2020-04-17 11:26:16 -04:00
d13eca11e2 Higher tolerance for past testing in T5 (#3843) 2020-04-17 11:25:14 -04:00
b0c9fbb293 Add workflow to build docs (#3763) 2020-04-17 11:23:18 -04:00
c19727fd38 Add support for the null answer in QuestionAnsweringPipeline (#3441)
* Add support for the null answer in `QuestionAnsweringPipeline`

* black

* Fix min null score computation

* Fix a PR comment
2020-04-17 11:17:21 -04:00
edf0582c0b Fix token_type_id in BERT question-answering example (#3790)
token_type_id is converted into the segment embedding. For question answering,
this needs to highlight whether a token belongs to sequence 0 or 1.
encode_plus takes care of correctly setting this parameter automatically.
2020-04-17 11:14:12 -04:00
6d00033e97 Question Answering support for Albert and Roberta in TF (#3812)
* Add TFAlbertForQuestionAnswering

* Add TFRobertaForQuestionAnswering

* Update TFAutoModel with Roberta/Albert for QA

* Clean `super` TF Albert calls
2020-04-17 10:45:30 -04:00
f399c00610 Update README 2020-04-17 09:42:22 +02:00
f0c96fafd1 [examples] summarization/bart/finetune.py supports t5 (#3824)
renames `run_bart_sum.py` to `finetune.py`
2020-04-16 15:15:19 -04:00
0cec4fab7d typo: fine-grained token-leven
Changing from "fine-grained token-leven" to "fine-grained token-level"
2020-04-16 15:11:23 -04:00
14cdeee75a Tanh torch warnings 2020-04-16 15:10:35 -04:00
16469fedbd [PretrainedTokenizer] Factor out tensor conversion method (#3777) 2020-04-16 15:02:43 -04:00
80a1694514 [Examples, T5] Change newstest2013 to newstest2014 and clean up (#3817)
* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion

* More pythonic file handling through 'with' context

* COSMETIC - ran Black and isort

* Fixed reference to number of lines in newstest2014

* Fixed failing test. More pythonic file handling

* finish PR from tholiao

* remove outcommented lines

* make style

* make isort happy

Co-authored-by: Thomas Liao <tholiao@gmail.com>
2020-04-16 20:00:41 +02:00
d486795158 JIT not compatible with PyTorch/XLA (#3743) 2020-04-16 11:19:24 -04:00
b1e2368b32 Typo fix (#3821) 2020-04-16 11:04:32 -04:00
baca8fa8e6 clean pipelines (#3795) 2020-04-16 10:21:34 -04:00
38f7461df3 [TFT5, Cache] Add cache to TFT5 (#3772)
* correct gpt2 test inputs

* make style

* delete modeling_gpt2 change in test file

* translate from pytorch

* correct tests

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* make tensorflow t5 caching work

* make style

* clean reorder cache

* remove unnecessary spaces

* fix test
2020-04-16 16:14:52 +02:00
a5b249472e change pad token id to config pad token id (#3793) 2020-04-16 15:58:57 +02:00
dbd041243d [cleanup] factor out get_head_mask, invert_attn_mask, get_exten… (#3806)
* Delete some copy pasted code
2020-04-16 09:55:25 -04:00
d22894dfd4 [Docs] Add DialoGPT (#3755)
* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst

Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-16 09:04:32 +02:00
c59b1e682d [examples] unit test for run_bart_sum (#3544)
- adds pytorch-lightning dependency
2020-04-15 18:35:01 -04:00
301bf8d1b4 Create Modelcard for Reformer Model 2020-04-15 16:26:24 +02:00
01c37dcdb5 [Config, Caching] Remove output_past everywhere and replace by use_cache argument (#3734)
* remove output_past from pt

* make style

* add optional input length for gpt2

* add use cache to prepare input

* save memory in gpt2

* correct gpt2 test inputs

* make past input optional for gpt2

* finish use_cache for all models

* make style

* delete modeling_gpt2 change in test file

* correct docstring

* correct is true statements for gpt2
2020-04-14 14:40:28 -04:00
092cf881a5 [Generation, EncoderDecoder] Apply Encoder Decoder 1.5GB memory… (#3778) 2020-04-13 22:29:28 -04:00
352d5472b0 Shift labels internally within TransfoXLLMHeadModel when called with labels (#3716)
* Shifting labels inside TransfoXLLMHead

* Changed doc to reflect change

* Updated pytorch test

* removed IDE whitespace changes

* black reformat

Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-04-13 18:11:23 +02:00
5ebd898953 fix dataset shuffling for Distributed training (#huggingface#3721) (#3766) 2020-04-13 10:11:18 -04:00
7972a4019f updated dutch squad model card (#3736)
* added model_cards for polish squad models

* corrected mistake in polish design cards

* updated model_cards for squad2_dutch model

* added links to benchmark models

Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
2020-04-11 06:44:59 -04:00
f8c1071c51 Added README huseinzol05/albert-tiny-bahasa-cased (#3746)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base

* added albert tiny
2020-04-11 06:42:06 -04:00
700ccf6e35 Fix glue_convert_examples_to_features API breakage (#3742) 2020-04-10 16:03:27 -04:00
b7cf9f43d2 Update tokenizers to 0.7.0-rc5 (#3705) 2020-04-10 14:23:49 -04:00
551b450527 Add run_glue_tpu.py that trains models on TPUs (#3702)
* Initial commit to get BERT + run_glue.py on TPU

* Add README section for TPU and address comments.

* Cleanup TPU bits from run_glue.py (#3)

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* Cleanup TPU bits from run_glue.py

TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.

* No need to call `xm.mark_step()` explicitly (#4)

Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.

* Resolve R/W conflicts from multiprocessing (#5)

* Add XLNet in list of models for `run_glue_tpu.py` (#6)

* Add RoBERTa to list of models in TPU GLUE (#7)

* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)

* Use barriers to reduce duplicate work/resources (#9)

* Shard eval dataset and aggregate eval metrics (#10)

* Shard eval dataset and aggregate eval metrics

Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.

* Change defaultdict to float

* Reduce the pred, label tensors instead of metrics

As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.

* Only use tb_writer from master (#11)

* Apply huggingface black code formatting

* Style

* Remove `--do_lower_case` as example uses cased

* Add option to specify tensorboard logdir

This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.

* Using configuration for `xla_device`

* Prefix TPU specific comments.

* num_cores clarification and namespace eval metrics

* Cache features file under `args.cache_dir`

Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.

* Rename `run_glue_tpu` to `run_tpu_glue`

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-04-10 12:53:54 -04:00
cbad305ce6 [docs] The use of do_lower_case in scripts is on its way to deprecation (#3738) 2020-04-10 12:34:04 -04:00
b169ac9c2b [examples] Generate argparsers from type hints on dataclasses (#3669)
* [examples] Generate argparsers from type hints on dataclasses

* [HfArgumentParser] way simpler API

* Restore run_language_modeling.py for easier diff

* [HfArgumentParser] final tweaks from code review
2020-04-10 12:21:58 -04:00
7a7fdf71f8 Multilingual BART - (#3602)
- support mbart-en-ro weights
- add MBartTokenizer
2020-04-10 11:25:39 -04:00
f98d0ef2a2 Big cleanup of glue_convert_examples_to_features (#3688)
* Big cleanup of `glue_convert_examples_to_features`

* Use batch_encode_plus

* Cleaner wrapping of glue_convert_examples_to_features for TF

@lysandrejik

* Cleanup syntax, thanks to @mfuntowicz

* Raise explicit error in case of user error
2020-04-10 10:20:18 -04:00
ce2298fb5f [T5, generation] Add decoder caching for T5 (#3682)
* initial commit to add decoder caching for T5

* better naming for caching

* finish T5 decoder caching

* correct test

* added extensive past testing for T5

* clean files

* make tests cleaner

* improve docstring

* improve docstring

* better reorder cache

* make style

* Update src/transformers/modeling_t5.py

Co-Authored-By: Yacine Jernite <yjernite@users.noreply.github.com>

* make set output past work for all layers

* improve docstring

* improve docstring

Co-authored-by: Yacine Jernite <yjernite@users.noreply.github.com>
2020-04-10 01:02:50 +02:00
9384e5f6de Fix force_download of files on Windows (#3697) 2020-04-09 14:44:57 -04:00
bc65afc4df [Exbert] Change style of button 2020-04-09 10:44:42 -04:00
31baeed614 Update quotes
cc @julien-c
2020-04-09 09:09:00 -04:00
f8208fa456 Correct transformers-cli env call 2020-04-09 09:03:19 +02:00
6435b9f908 Updating the TensorFlow models to work as expected with tokenizers v3.0.0 (#3684)
* Updating modeling tf files; adding tests

* Merge `encode_plus` and `batch_encode_plus`
2020-04-08 16:22:44 -04:00
500aa12318 close #3699 2020-04-08 14:32:47 -04:00
a594ee9c84 More doc for model cards (#3698)
see https://github.com/huggingface/transformers/pull/3679#pullrequestreview-389368270
2020-04-08 12:12:52 -04:00
83703cd077 Update doc for {Summarization,Translation}Pipeline and other tweaks 2020-04-08 09:45:00 -04:00
a1b3b4167e Created README.md for model card ChemBERTa (#3666)
* created readme.md

* update readme with fixes

Fixes from PR comments
2020-04-08 09:10:20 -04:00
747907dc5e Fix typo in FeatureExtractionPipeline docstring 2020-04-08 09:08:56 -04:00
715aa5b135 [Bart] Replace config.output_past with use_cache kwarg (#3632) 2020-04-07 19:08:26 -04:00
e344e3d402 [examples] SummarizationDataset cleanup (#3451) 2020-04-07 19:05:58 -04:00
b0ad069517 [Tokenization] fix edge case for bert tokenization (#3517)
* fix egde gase for bert tokenization

* add Lysandres comments for improvement

* use new is_pretokenized_flag
2020-04-07 16:26:31 -04:00
80fa0f7812 [Examples, Benchmark] Improve benchmark utils (#3674)
* improve and add features to benchmark utils

* update benchmark style

* remove output files
2020-04-07 16:25:57 -04:00
05deb52dc1 Optimize causal mask using torch.where (#2715)
* Optimize causal mask using torch.where

Instead of multiplying by 1.0 float mask, use torch.where with a bool mask for increased performance.

* Maintain compatiblity with torch 1.0.0 - thanks for PR feedback

* Fix typo

* reformat line for CI
2020-04-07 22:19:18 +02:00
0a4b1068e1 Speedup torch summarization tests (#3663) 2020-04-07 14:01:30 -04:00
5aa8a278a3 Fix roberta checkpoint conversion script (#3642) 2020-04-07 12:03:23 -04:00
11cc1e168b [model_cards] Turn down spurious warnings
Close #3639 + spurious warning mentioned in #3227

cc @lysandrejik @thomwolf
2020-04-07 10:20:19 -04:00
0a9d09b42a fixed TransfoXLLMHeadModel documentation (#3661)
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
2020-04-07 00:47:51 +02:00
96ab75b8dd Tokenizers v3.0.0 (#3185)
* Renamed num_added_tokens to num_special_tokens_to_add

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Cherry-Pick: Partially fix space only input without special tokens added to the output #3091

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Make fast tokenizers unittests work on Windows.

* Entirely refactored unittest for tokenizers fast.

* Remove ABC class for CommonFastTokenizerTest

* Added embeded_special_tokens tests from allenai @dirkgr

* Make embeded_special_tokens tests from allenai more generic

* Uniformize vocab_size as a property for both Fast and normal tokenizers

* Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)

* Ensure providing None input raise the same ValueError than Python tokenizer + tests.

* Fix invalid input for assert_padding when testing batch_encode_plus

* Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.

* Ensure tokenize() correctly forward add_special_tokens to rust.

* Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
Avoid stripping on None values.

* unittests ensure tokenize() also throws a ValueError if provided None

* Added add_special_tokens unittest for all supported models.

* Style

* Make sure TransfoXL test run only if PyTorch is provided.

* Split up tokenizers tests for each model type.

* Fix invalid unittest with new tokenizers API.

* Filter out Roberta openai detector models from unittests.

* Introduce BatchEncoding on fast tokenizers path.

This new structure exposes all the mappings retrieved from Rust.
It also keeps the current behavior with model forward.

* Introduce BatchEncoding on slow tokenizers path.

Backward compatibility.

* Improve error message on BatchEncoding for slow path

* Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.

* Style and format.

* Added typing on all methods for PretrainedTokenizerFast

* Style and format

* Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.

* Style and format

* encode_plus now supports pretokenized inputs.

* Remove user warning about add_special_tokens when working on pretokenized inputs.

* Always go through the post processor.

* Added support for pretokenized input pairs on encode_plus

* Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.

* Added pretokenized inputs support on batch_encode_plus

* Update BatchEncoding methods name to match Encoding.

* Bump setup.py tokenizers dependency to 0.7.0rc1

* Remove unused parameters in BertTokenizerFast

* Make sure Roberta returns token_type_ids for unittests.

* Added missing typings

* Update add_tokens prototype to match tokenizers side and allow AddedToken

* Bumping tokenizers to 0.7.0rc2

* Added documentation for BatchEncoding

* Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.

* Added higher-level typing for tokenize / encode_plus / batch_encode_plus.

* Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.

* Fix text-classification pipeline using the wrong tokenizer

* Make pipelines works with BatchEncoding

* Turn off add_special_tokens on tokenize by default.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove add_prefix_space from tokenize call in unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style and quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Correct message for batch_encode_plus none input exception.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid list comprehension for offset_mapping overriding content every iteration.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* TransfoXL uses Strip normalizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.7.0rc3

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Support AddedTokens for special_tokens and use left stripping on mask for Roberta.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* SpecilaTokenMixin can use slots to faster access to underlying attributes.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove update_special_tokens from fast tokenizers.

* Ensure TransfoXL unittests are run only when torch is available.

* Style.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Style

* Style 🙏🙏

* Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.

* Remove Roberta warning on __init__.

* Move documentation to Google style.

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-04-07 00:29:15 +02:00
e52d1258e0 Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (#3631)
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py

`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.

* Simplifying change to match recent commits
2020-04-06 16:52:22 -04:00
0ac33ddd8d Create README.md 2020-04-06 16:35:29 -04:00
326e6ebae7 Add model card 2020-04-06 16:30:01 -04:00
43eca3f878 Add model card 2020-04-06 16:29:51 -04:00
6bec88ca42 Create README.md 2020-04-06 16:29:44 -04:00
769b60f935 Add model card (#3655)
* Add model card

* Fix model name in fine-tuning script
2020-04-06 16:29:36 -04:00
c4bcb01906 Create model card (#3654)
* Create model card

* Fix model name in fine-tuning script
2020-04-06 16:29:25 -04:00
6903a987b8 Create README.md 2020-04-06 16:29:02 -04:00
760872dbde Create README.md (#3662) 2020-04-06 16:27:50 -04:00
47e1334c0b Add model card for BERTeus (#3649)
* Add model card for BERTeus

* Update README
2020-04-06 16:21:25 -04:00
529534dc2f BioMed Roberta-Base (AllenAI) (#3643)
* added model card

* updated README

* updated README

* updated README

* added evals

* removed pico eval

* Tweaks

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-04-06 16:12:09 -04:00
261c4ff4e2 Update notebooks (#3620)
* Update notebooks

* From local to global link

* from local links to *actual* global links
2020-04-06 14:32:39 -04:00
39a34cc375 [model_cards] ELECTRA (w/ examples of usage)
Co-Authored-By: Kevin Clark <clarkkev@users.noreply.github.com>
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-04-06 11:43:33 -04:00
ea6dba2787 Re-pin isort 2020-04-06 10:09:54 -04:00
11c3257a18 unpin isort for pypi 2020-04-06 10:06:41 -04:00
36bffc81b3 Release: v2.8.0 2020-04-06 10:03:53 -04:00
2ee410560e [Generate, Test] Split generate test function into beam search, no beam search (#3601)
* split beam search and no beam search test

* fix test

* clean generate tests
2020-04-06 10:37:05 +02:00
1789c7daf1 fix argument order (#3637) 2020-04-05 12:33:41 +02:00
b809d2f073 Fix TF T5 docstring (#3636) 2020-04-05 12:23:09 +02:00
4ab8ab4f50 Adjust model card to reflect changes to vocabulary
(cherry picked from commit 8e25c4bf2838211378db4d93e7f9722386cc1a04)
2020-04-04 15:27:41 -04:00
ac40eed1a5 Create README.md
adding readme for 
ktrapeznikov/albert-xlarge-v2-squad-v2
2020-04-04 15:18:54 -04:00
fd9995ebc5 Create README.md 2020-04-04 15:18:31 -04:00
5d912e7ed4 Tweak typing for #3566 2020-04-04 15:04:03 -04:00
94eb68d742 weigths*weights 2020-04-04 15:03:26 -04:00
243e687be6 Create model card 2020-04-04 08:20:34 -04:00
3e4b4dd190 [model_cards] Link to ExBERT visualisation
Hat/tip @bhoov @HendrikStrobelt @sebastianGehrmann

Also cc @srush and @thomwolf
2020-04-03 20:03:29 -04:00
c6acd246ec Speed up GELU computation with torch.jit (#2988)
* Compile gelu_new with torchscript

* Compile _gelu_python with torchscript

* Wrap gelu_new with torch.jit for torch>=1.4
2020-04-03 15:20:21 -04:00
d5d7d88612 ELECTRA (#3257)
* Electra wip

* helpers

* Electra wip

* Electra v1

* ELECTRA may be saved/loaded

* Generator & Discriminator

* Embedding size instead of halving the hidden size

* ELECTRA Tokenizer

* Revert BERT helpers

* ELECTRA Conversion script

* Archive maps

* PyTorch tests

* Start fixing tests

* Tests pass

* Same configuration for both models

* Compatible with base + large

* Simplification + weight tying

* Archives

* Auto + Renaming to standard names

* ELECTRA is uncased

* Tests

* Slight API changes

* Update tests

* wip

* ElectraForTokenClassification

* temp

* Simpler arch + tests

Removed ElectraForPreTraining which will be in a script

* Conversion script

* Auto model

* Update links to S3

* Split ElectraForPreTraining and ElectraForTokenClassification

* Actually test PreTraining model

* Remove num_labels from configuration

* wip

* wip

* From discriminator and generator to electra

* Slight API changes

* Better naming

* TensorFlow ELECTRA tests

* Accurate conversion script

* Added to conversion script

* Fast ELECTRA tokenizer

* Style

* Add ELECTRA to README

* Modeling Pytorch Doc + Real style

* TF Docs

* Docs

* Correct links

* Correct model intialized

* random fixes

* style

* Addressing Patrick's and Sam's comments

* Correct links in docs
2020-04-03 14:10:54 -04:00
8594dd80dd BertJapaneseTokenizer accept options for mecab (#3566)
* BertJapaneseTokenizer accept options for mecab

* black

* fix mecab_option to Option[str]
2020-04-03 11:12:19 -04:00
216e167ce6 Added albert-base-bahasa-cased README and fixed tiny-bert-bahasa-cased README (#3613)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme

* added albert base
2020-04-03 09:28:43 -04:00
1ac6a246d8 Update README.md (#3604)
Update AutoModel & AutoTokernizer loading.
2020-04-03 09:28:25 -04:00
e91692f4a3 Update README.md (#3603) 2020-04-03 09:27:57 -04:00
8e287d507d corrected mistake in polish model cards (#3611)
* added model_cards for polish squad models

* corrected mistake in polish design cards

Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
2020-04-03 09:07:15 -04:00
81484b447b Create README.md (#3568)
* Create README.md

* added meta block (language: german)

* Added additional information about test data
2020-04-02 21:48:31 -04:00
9f6349aba9 Create README.md 2020-04-02 21:43:12 -04:00
ddb1ce7418 added model_cards for polish squad models 2020-04-02 21:40:16 -04:00
f68d22850c delete bogus print statement (#3595) 2020-04-02 21:49:34 +02:00
c50aa67bff Resizing embedding matrix before sending it to the optimizer. (#3532)
* Resizing embedding matrix after sending it to the optimizer prevents from updating the newly resized matrix.

* Remove space for style matter
2020-04-02 15:00:05 -04:00
1b10159950 Adding should_continue check for retraining (#3509) 2020-04-02 14:07:08 -04:00
390c128592 [Encoder-Decoder] Force models outputs to always have batch_size as their first dim (#3536)
* solve conflicts

* improve comments
2020-04-02 15:18:33 +02:00
ab5d06a094 [T5, examples] replace heavy t5 models with tiny random models (#3556)
* replace heavy t5 models with tiny random models as was done by sshleifer

* fix isort
2020-04-02 12:34:05 +02:00
a4ee4da18a [T5, TF 2.2] change tf t5 argument naming (#3547)
* change tf t5 argument naming for TF 2.2

* correct bug in testing
2020-04-01 22:04:20 +02:00
06dd597552 fix bug in warnings T5 pipelines (#3545) 2020-04-01 21:59:12 +02:00
9de9ceb6c5 Correct output shape for Bert NSP models in docs (#3482) 2020-04-01 15:04:38 -04:00
b815edf69f [T5, Testst] Add extensive hard-coded integration tests and make sure PT and TF give equal results (#3550)
* add some t5 integration tests

* finish summarization and translation integration tests for T5 - results loook good

* add tf test

* fix == vs is bug

* fix tf beam search error and make tf t5 tests pass
2020-04-01 18:01:33 +02:00
8538ce9044 Add tiny-bert-bahasa-cased model card (#3567)
* add bert bahasa readme

* update readme

* update readme

* added xlnet

* added tiny-bert and fix xlnet readme
2020-04-01 07:15:00 -04:00
c1a6252be1 Create model card (#3557)
Create model card for: distilbert-multi-finetuned-for-xqua-on-tydiqa
2020-04-01 07:14:23 -04:00
50e15c825c Tokenizers: Start cleaning examples a little (#3455)
* Start cleaning examples

* Fixup
2020-04-01 07:13:40 -04:00
b38d552a92 [Generate] Add bad words list argument to the generate function (#3367)
* add bad words list

* make style

* add bad_words_tokens

* make style

* better naming

* make style

* fix typo
2020-03-31 18:42:31 +02:00
ae6834e028 [Examples] Clean summarization and translation example testing files for T5 and Bart (#3514)
* fix conflicts

* add model size argument to summarization

* correct wrong import

* fix isort

* correct imports

* other isort make style

* make style
2020-03-31 17:54:13 +02:00
0373b60c4c Update README.md (#3552)
- Show that the last uploaded version was trained on more data (custom_license files)
2020-03-31 10:40:34 -04:00
83d1fbcff6 [Docs] Add usage examples for translation and summarization (#3538) 2020-03-31 09:36:03 -04:00
55bcae7f25 remove useless and confusing lm_labels line (#3531) 2020-03-31 09:32:25 -04:00
42e1e3c67f Update usage doc regarding generate fn (#3504) 2020-03-31 09:31:46 -04:00
57b0fab692 Add better explanation to check docs locally. (#3459) 2020-03-31 09:30:17 -04:00
a8d4dff0a1 Update README.md (#3470)
Fix typo
2020-03-31 08:01:09 -04:00
4a5663568f Create card for the model: GPT-2-finetuned-covid-bio-medrxiv (#3453) 2020-03-31 08:01:03 -04:00
bbedb59675 Create README.md (#3393)
* Create README.md

* Update README.md
2020-03-31 08:00:35 -04:00
c2cf192943 Add link to 16 POS tags model (#3465) 2020-03-31 08:00:00 -04:00
c82ef72158 Added CovidBERT-NLI model card (#3477) 2020-03-31 07:59:49 -04:00
b48a1f08c1 Add text shown in example of usage (#3464) 2020-03-31 07:59:36 -04:00
99833a9cbf Create model card (#3487) 2020-03-31 07:59:22 -04:00
ebceeeacda Add electra and alectra model cards (#3524) 2020-03-31 07:58:48 -04:00
a6c4ee27fd Add model cards (#3537)
* feat: add model card bert-imdb

* feat: add model card gpt2-imdb-pos

* feat: add model card gpt2-imdb
2020-03-31 07:54:45 -04:00
e5c393dceb [Bug fix] Using loaded checkpoint with --do_predict (instead of… (#3437)
* Using loaded checkpoint with --do_predict

Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the `model` variable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).

* Update checkpoint loading

* Fixing model loading
2020-03-30 17:06:08 -04:00
8deff3acf2 [bart-tiny-random] Put a 5MB model on S3 to allow faster exampl… (#3488) 2020-03-30 12:28:27 -04:00
1f72865726 [BART] Update encoder and decoder on set_input_embedding (#3501)
Co-authored-by: Ioannis Douratsos <ioannisd@amazon.com>
2020-03-30 12:20:37 -04:00
cc598b312b [InputExample] Unfreeze for now, cf. #3423 2020-03-30 10:41:49 -04:00
d38bbb225f Update the NER TF script (#3511)
* Update the NER TF script to remove the softmax and make the pad token label id to -1

* Reformat the quality and style

Co-authored-by: Julien Plu <julien.plu@adevinta.com>
2020-03-30 09:50:12 -04:00
eff757f2e3 Re-pin isort version 2020-03-30 09:00:47 -04:00
a009d751c2 Un-pin isort for v2.7.0 pypi 2020-03-30 08:55:10 -04:00
6f5a12a583 Release: v2.7.0 2020-03-30 08:49:24 -04:00
296252c49e fix lm lables in docstring (#3529) 2020-03-30 14:26:24 +02:00
75ec6c9e3a [T5] make decoder input ids optional for t5 training (#3521)
* make decoder input ids optional for t5 training

* lm_lables should not be shifted in t5

* add tests

* finish shift right functionality for PT T5

* move shift right to correct class

* cleaner code

* replace -100 values with pad token id

* add assert statement

* remove unnecessary for loop

* make style
2020-03-30 13:45:26 +02:00
5b44e0a31b [T5] Add training documenation (#3507)
* Add clear description of how to train T5

* correct docstring in T5

* correct typo

* correct docstring format

* update t5 model docs

* implement collins feedback

* fix typo and add more explanation for sentinal tokens

* delete unnecessary todos
2020-03-30 13:35:53 +02:00
33ef7002e1 [Docs] examples/summarization/bart: Simplify CNN/DM preprocessi… (#3516) 2020-03-29 13:25:42 -04:00
f6a23d1911 [BART] add bart-large-xsum weights (#3422) 2020-03-29 10:51:13 -04:00
601ac5b1dc [model_cards]: use MIT license for all dbmdz models 2020-03-27 18:06:25 -04:00
17dceae7a1 Fix circle ci flaky fail of wmt example (#3485)
* force bleu

* fix wrong file name

* rename file

* different filenames for each example test

* test files should clean up after themselves

* test files should clean up after themselves

* do not force bleu

* correct typo

* fix isort
2020-03-27 13:01:28 -04:00
00ea100e96 add summarization and translation to notebook (#3478) 2020-03-27 11:05:37 -04:00
b08259a120 run_ner.py / bert-base-multilingual-cased can output empty tokens (#2991)
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.

This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-27 10:59:55 -04:00
f4f4946836 Rename t5-large to t5-base in README.md 2020-03-27 15:57:58 +01:00
fa9af2468a Add T5 to docs (#3461)
* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs
2020-03-27 10:57:16 -04:00
ff80b73157 Add option to choose T5 model size. (#3480)
T5-small in test


isort
2020-03-27 15:56:59 +01:00
e2c05f06ef Correct indentation in docstring
For some reason Sphinx extremely dislikes this and crashes.
2020-03-27 09:28:52 -04:00
3ee431dd4c [Bart/Memory] Two separate, smaller decoder attention masks (#3371) 2020-03-26 21:34:15 -04:00
53fe733805 Model Cards: Fix grammar error (#3467) 2020-03-26 21:33:33 -04:00
c10decf7a0 [Bart: example] drop columns that are exclusively pad_token_id… (#3400)
* trim seq_len below 1024 if there are columns full of pad_token_id
* Centralize trim_batch so SummarizationDataset can use it too
2020-03-26 19:33:54 -04:00
63f4d8cad0 [Bart/Memory] SelfAttention only returns weights if config.outp… (#3369) 2020-03-26 18:42:39 -04:00
2b2a2f8df2 [Bart] Fix: put dummy_inputs on correct device (#3398)
* Dummy inputs to model.device

* Move self.device to ModuleUtilsMixin
2020-03-26 18:42:09 -04:00
1a5aefc95c [Seq2Seq Generation] Call encoder before expanding input_ids (#3370) 2020-03-26 18:41:19 -04:00
39371ee454 [Bart/Memory] don't create lm_head (#3323)
* delete lm_head, skips weight tying
* Fixed s3
2020-03-26 18:40:39 -04:00
5ad2ea06af Add wmt translation example (#3428)
* add translation example

* make style

* adapt docstring

* add gpu device as input for example

* small renaming

* better README
2020-03-26 19:07:59 +01:00
b4fb94fe6d revert unpin isort commit 2020-03-26 13:19:18 -04:00
e703e923ca Add t5 summarization example (#3411)
* rebase to master

* change tf to pytorch

* change to pytorch

* small fix

* renaming

* add gpu training possibility

* renaming

* improve README

* incoorporate collins feedback

* better Readme

* better README.md
2020-03-26 18:17:55 +01:00
1a6c546c6f Add missing token classification for XLM (#3277)
* Add the missing token classification for XLM

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add the missing token classification for XLM

* fix styling

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add missing description for AlbertForTokenClassification

* fix styling

* Add missing docstring for AlBert

* Slow tests should be slow

Co-authored-by: Sakares Saengkaew <s.sakares@gmail.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-03-26 10:22:13 -04:00
311970546f rename string in pipeline 2020-03-26 14:59:49 +01:00
7420a6a9cc Create card for model GPT-2-finetuned-CORD19 2020-03-26 09:10:09 -04:00
022e8fab97 Adds translation pipeline (#3419)
* fix merge conflicts

* add t5 summarization example

* change parameters for t5 summarization

* make style

* add first code snippet for translation

* only add prefixes

* add prefix patterns

* make style

* renaming

* fix conflicts

* remove unused patterns

* solve conflicts

* fix merge conflicts

* remove translation example

* remove summarization example

* make sure tensors are in numpy for float comparsion

* re-add t5 config

* fix t5 import config typo

* make style

* remove unused numpy statements

* update doctstring

* import translation pipeline
2020-03-26 13:50:58 +01:00
3c5c567507 Update model card huseinzol05/bert-base-bahasa-cased (#3425)
* add bert bahasa readme

* update readme

* update readme

* added xlnet
2020-03-26 07:50:27 -04:00
9c683ef01e Add t5 to pipeline(task='summarization') (#3413)
* solve conflicts

* move warnings below

* incorporate changes

* add pad_to_max_length to pipelines

* add bug fix for T5 beam search

* add prefix patterns

* make style

* fix conflicts

* adapt pipelines for task specific parameters

* improve docstring

* remove unused patterns
2020-03-26 11:03:13 +01:00
ffcffebe85 Force the return of token type IDs (#3439) 2020-03-26 09:41:36 +01:00
010e0460b2 Updated/added model cards (#3435) 2020-03-25 16:40:03 -04:00
ffa17fe322 Extend config with task specific configs. (#3433)
* add new default configs

* change prefix default to None
2020-03-25 21:32:04 +01:00
83272a3853 Experiment w/ dataclasses (including Py36) (#3423)
* [ci] Also run test_examples in py37

(will revert at the end of the experiment)

* InputExample: use immutable dataclass

* [deps] Install dataclasses for Py<3.7

* [skip ci] Revert "[ci] Also run test_examples in py37"

This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
2020-03-25 11:10:20 -04:00
ccbe839ee0 Added BioBERT-NLI model card (#3421) 2020-03-24 21:15:55 -04:00
3d76df3a12 BART for summarization training with CNN/DM using pytorch-lightning 2020-03-24 21:00:24 -04:00
eaabaaf750 [run_language_modeling] Fix: initialize a new model from a config object 2020-03-24 17:56:40 -04:00
f8823bad9a Expose missing mappings (see #3415) 2020-03-24 17:46:25 -04:00
d0c36a7b72 [ci] Partial revert of 18eec3a9847 due to fbc5bf10cfe 2020-03-24 12:10:43 -04:00
fbc5bf10cf v2.6.0 release: isort un-pinned 2020-03-24 11:52:02 -04:00
b88bda6af3 Add right model and tokenizer path in example 2020-03-24 11:30:12 -04:00
b31ef225cf [model_cards] 🇹🇷 Add new (uncased, 128k) BERTurk model 2020-03-24 11:29:06 -04:00
b4009cb001 [model_cards] 🇹🇷 Add new (cased, 128k) BERTurk model 2020-03-24 11:29:06 -04:00
d3283490ef [model_cards] 🇹🇷 Add new (uncased) BERTurk model 2020-03-24 11:29:06 -04:00
e279a312d6 Model cards for CS224n SQuAD2.0 models (#3406)
* Model cards for CS224n SQuAD2.0 models

* consistent spacing
2020-03-24 11:28:33 -04:00
7372e62b2c Added precisions in SciBERT-NLI model card (#3410) 2020-03-24 11:01:56 -04:00
471cce24b3 Release: v2.6.0 2020-03-24 10:37:32 -04:00
e392ba6938 Add camembert integration tests (#3375)
* add integration tests for camembert

* use jplu/tf-camembert fro the moment

* make style
2020-03-24 10:18:37 +01:00
a8e3336a85 [examples] Use AutoModels in more examples 2020-03-23 20:11:14 -04:00
ec6766a363 [deps] scikit-learn's transient issue was fixed 2020-03-23 18:38:09 -04:00
f7dcf8fcea [BertAbs] Move files around for more consistent naming 2020-03-23 13:58:49 -04:00
e25c4f4027 [ALBERT] move things around for more consistent naming
see #3359

cc @lysandrejik
2020-03-23 13:58:21 -04:00
85b324bee5 Add comparison table with older brother in family 2020-03-23 12:11:20 -04:00
b7aa077a63 Create card for the model 2020-03-23 12:10:41 -04:00
f740177c87 Add comparison table with new models 2020-03-23 12:10:23 -04:00
e52482909b Correct order for dev/quality dependencies
cc @julien-c
2020-03-23 12:01:23 -04:00
28424906c2 Added scibert-nli model card 2020-03-23 11:55:41 -04:00
18eec3a984 [ci] simpler way to load correct version of isort
hat/tip @bramvanroy
2020-03-23 10:03:22 -04:00
cf72479bf1 One last reorder of {scheduler,optimizer}.step() 2020-03-20 18:05:50 -04:00
634bf6cf7e fixes lr_scheduler warning
For more details, see https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2020-03-20 18:03:50 -04:00
265709f5cd New model, new model cards 2020-03-20 18:01:01 -04:00
115abd2166 Handle pinned version of isort
The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.
2020-03-20 18:00:04 -04:00
95e00d0808 Clean special token init in modeling_....py (#3264)
* make style

* fix conflicts
2020-03-20 21:41:04 +01:00
8becb73293 removing torch.cuda.empty_cache() from TF function (#3267)
torch.cuda.empty_cache() was being called from a TF function (even when torch is unavailable)
not sure any replacement is needed if TF OOMs
2020-03-19 23:25:30 +01:00
ecfd336318 Simpler Error message when loading config/model with .from_pretrained() (#3341) 2020-03-19 23:23:03 +01:00
8eeefcb576 Update 01-training-tokenizers.ipynb (typo issue) (#3343)
I found there are two grammar errors or typo issues in the explanation of the encoding properties.

The original sentences:
If your was made of multiple \"parts\" such as (question, context), then this would be a vector with for each token the segment it belongs to
If your has been truncated into multiple subparts because of a length limit (for BERT for example the sequence length is limited to 512), this will contain all the remaining overflowing parts.

I think "input" should be inserted after the phrase "If your".
2020-03-19 23:21:49 +01:00
bbf26c4e61 Support T5 Generation (#3228)
* fix conflicts

* update bart max length test

* correct spelling mistakes

* implemented model specific encode function

* fix merge conflicts

* better naming

* save intermediate state -> need to rethink strucuture a bit

* leave tf problem as it is for now

* current version

* add layers.pop

* remove ipdb

* make style

* clean return cut decoding

* remove ipdbs

* Fix restoring layers in the decoders that doesnt exists.

* push good intermediate solution for now

* fix conflicts

* always good to refuse to merge conflicts when rebasing

* fix small bug

* improve function calls

* remove unused file

* add correct scope behavior for t5_generate

Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-03-19 23:18:23 +01:00
656e1386a2 Fix #3305: run_ner only possible on ModelForTokenClassification models 2020-03-19 16:41:28 -04:00
0c44b11917 add bert bahasa readme 2020-03-19 15:08:19 -04:00
e99af3b17b Create model card for bert-small-finetuned-squadv2 2020-03-19 15:07:55 -04:00
39db055268 Merge pull request #3348 from mrm8488/patch-28
Create card for BERT-Mini finetuned on SQuAD v2
2020-03-19 15:07:39 -04:00
dedc7a8fdb Create card for BERT-Tiny fine-tuned on SQuAD v2
- Only 17MB of Model weights!!
2020-03-19 15:07:22 -04:00
676adf8625 Created card for spanbert-finetuned-squadv1 2020-03-19 15:06:35 -04:00
11d8bcc9d7 Add model cards for FinBERT. (#3331)
* Add a model card for FinBERT

This is a copy of https://github.com/TurkuNLP/FinBERT/blob/master/README.md.

* Added a file for uncased.

* Add metadata for cased.

* Added metadata for uncased.
2020-03-19 15:06:01 -04:00
f049be7ad4 Export ALBERT main layer in TensorFlow (#3354) 2020-03-19 13:53:05 -04:00
3bedfd3347 Fix wrong link for the notebook file (#3344)
For the tutorial of "How to generate text", the URL link was wrong (it was linked to the tutorial of "How to train a language model").

I fixed the URL.
2020-03-19 17:22:47 +01:00
b2c2c31c60 Minor Bug Fix for Running Roberta on Glue (#3240)
* added return_token_type_ids argument for tokenizers which do not generate return_type_ids by default

* fixed styling

* Style

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2020-03-19 12:08:31 -04:00
4e4403c9b4 [BART] torch 1.0 compatibility (#3322)
* config.activation_function
2020-03-19 11:56:54 -04:00
c44a17db1b [FIX] not training when epoch is small (#3006)
* solving bug where for small epochs and large gradient_accumulation_steps we never train

* black formatting

* no need to change these files
2020-03-19 11:21:21 -04:00
ad7233fc01 [BART] cleanup: remove redundant kwargs, improve docstrings (#3319) 2020-03-19 11:16:51 -04:00
cd21d8bc00 Typo in warning message (#3219)
`T5Tokenizer` instead of `XLNetTokenizer`
2020-03-19 09:49:25 -04:00
8d3e218ea6 fix typo in docstring demonstrating usage (#3213) 2020-03-19 09:47:54 -04:00
cec3cdda15 Fix input ids can be none attn mask (#3345)
* fix issue 3289

* fix attention mask if input_ids None behavior
2020-03-19 09:55:17 +01:00
f6d813aaaa Create README.md 2020-03-18 23:45:02 -04:00
939328111b Create README.md
roberta_chinese_base card
2020-03-18 23:44:12 -04:00
29442d2edf Create README.md
albert_chinese_tiny card
2020-03-18 23:43:49 -04:00
20139b7c8d Added model cards for SciBERT models uploaded under AllenAI org (#3330)
* Create README.md

* model card

* add model card for cased
2020-03-18 15:45:11 -04:00
cae334c43c Improve fill-mask pipeline example in 03-pipelines notebook.
Remove hardcoded mask_token and use the value provided by the tokenizer.
2020-03-18 17:11:42 +01:00
4b1970bb4c Create README.md 2020-03-18 11:37:17 -04:00
d6afbd323d XLM-R Tokenizer now passes common tests + Integration tests (#3198)
* XLM-R now passes common tests + Integration tests

* Correct mask index

* Model input names

* Style

* Remove text preprocessing

* Unneccessary import
2020-03-18 09:52:49 -04:00
292186a3e7 Adding LM Head to Transfo-XL and first step to fixing problem with Adaptive Embeddings in TransfoXL (#3286)
* first commit

* work in progress

* make language generation task pass

* update to working version for LM

* delete print

* remove dead code

* make style
2020-03-18 09:24:27 -04:00
efdb46b6e2 add link to blog post (#3326) 2020-03-18 13:24:28 +01:00
ddb10c6447 improve doctstring (#3327) 2020-03-18 13:24:09 +01:00
d7f98cd3ef Init card for model 2020-03-18 07:55:27 -04:00
38a555a83c Add Summarization to Pipelines (#3128)
* passing

* Undo stupid chg

* docs

* undo rename

* delete-cruft

* only import if you have torch

* Dont rely on dict ordering

* Fix dict ordering upstream

* docstring link

* docstring link

* remove trailing comma for 3.5 compat

* new name

* delegate kwarging

* Update kwargs
2020-03-17 18:04:21 -04:00
2b60a26b46 Update examples/ner/run_ner.py to use AutoModel (#3305)
* Update examples/ner/run_ner.py to use AutoModel

* Fix missing code and apply `make style` command
2020-03-17 12:30:10 -04:00
e41212c715 Create model card for CodeBERTaPy (#3309) 2020-03-17 12:29:11 -04:00
0f1bc0d68e [model_cards] Add google thumbnail 2020-03-17 12:02:51 -04:00
930c9412b4 [WIP] Lightning glue example (#3290)
*  Alter base pl transformer to use automodels

* 🐛 Add batch size env variable to function call

* 💄 Apply black code style from Makefile

* 🚚 Move lightning base out of ner directory

*  Add lightning glue example

* 💄 self

* move _feature_file to base class

*  Move eval logging to custom callback

* 💄 Apply black code style

* 🐛 Add parent to pythonpath, remove copy command

* 🐛 Add missing max_length kwarg
2020-03-17 11:46:42 -04:00
e8f44af5bf [generate] do_sample default back to False (#3298)
* change do_samples back

* None better default as boolean

* adapt do_sample to True in test example

* make style
2020-03-17 10:52:37 -04:00
2187c49f5c CPU/GPU memory benchmarking utilities - Remove support for python 3.5 (now only 3.6+) (#3186)
* memory benchmark rss

* have both forward pass and line-by-line mem tracing

* cleaned up tracing

* refactored and cleaning up API

* no f-strings yet...

* add GPU mem logging

* fix GPU memory monitoring

* style and quality

* clean up and doc

* update with comments

* Switching to python 3.6+

* fix quality
2020-03-17 10:17:11 -04:00
bd3feddf67 Create README.md (#3306)
* Create README.md

* Updated README.md
2020-03-17 09:05:11 -04:00
68ef0a111f [model_cards] Symlink all Google AI's BERT Miniatures to source model card 2020-03-16 23:37:42 -04:00
b2c1a447fe [BART] Delete redundant unit test (#3302) 2020-03-16 23:09:10 -04:00
b2028cc26b Add model card for Google AI's BERT Miniatures (#3301)
This model card is intended to be shared among all models under google/bert_uncased_*
(We'll need some support from HuggingFace to get this card cross-linked from all models)
2020-03-16 21:51:46 -04:00
4759176313 add camembert for Question answering for examples 2020-03-16 14:42:11 -04:00
11573231c6 [BART] generation_mode as a kwarg not a class attribute (#3278) 2020-03-16 12:47:53 -04:00
de697935a2 Create model card for spanbert-finetuned-squadv2 2020-03-16 12:32:46 -04:00
3ddd2029bc Create CodeBERTaJS model card 2020-03-16 12:23:01 -04:00
879e1d3234 Add TF2 version of FlauBERT (#2700)
* Add TF2 version of FlauBERT

* Add TF2 version of FlauBERT

* Add documentation

* Apply style and quality

* Apply style once again

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-03-16 09:29:21 -04:00
af471ce5e8 Improved Error message when loading config/model with .from_pretrained() (#3247)
* better error message

* better error message

* update to model identifier instead of url

* update to model identifier instead of ur
2020-03-16 09:48:30 +01:00
5ea8ba67b4 [BART] Remove unused kwargs (#3279)
* Remove unused kwargs
* dont call forward in tests
2020-03-15 23:00:44 -04:00
3814e167d9 Merge pull request #3225 from patrickvonplaten/finalize_merge_bart_generate_into_default_generate
Complete merge Seq-2-Seq generation into default generation
2020-03-14 15:08:59 +01:00
2bd79e23de [BART] FP16 testing fixes (#3266) 2020-03-13 19:48:26 -04:00
8320feec09 [model_cards] CodeBERTa 2020-03-13 18:28:09 -04:00
ab756f713c add gpt2-xl for tf 2020-03-13 16:40:35 -04:00
4f75d380a4 make style 2020-03-13 16:35:52 +01:00
c2ee3840ae update file to new starting token logic 2020-03-13 16:34:44 +01:00
cc4c37952a Create camembert-base-README.md 2020-03-13 09:35:53 -04:00
afea70c01c Bump psutil from 5.6.3 to 5.6.6 in /examples/distillation
Bumps [psutil](https://github.com/giampaolo/psutil) from 5.6.3 to 5.6.6.
- [Release notes](https://github.com/giampaolo/psutil/releases)
- [Changelog](https://github.com/giampaolo/psutil/blob/master/HISTORY.rst)
- [Commits](https://github.com/giampaolo/psutil/compare/release-5.6.3...release-5.6.6)

Signed-off-by: dependabot[bot] <support@github.com>
2020-03-12 21:14:56 -04:00
087465b943 add BART to README (#3255) 2020-03-12 19:38:05 -04:00
6a82f774f2 fix typo 2020-03-12 21:10:51 +01:00
f1c71da115 fix eos_token_ids in test 2020-03-12 21:00:54 +01:00
6047f46b19 re-add eos token to get good bart results 2020-03-12 20:17:50 +01:00
c11160114a small clean-up 2020-03-12 20:02:35 +01:00
2e81b9d8d7 Bart: update example for #3140 compatibility (#3233)
* Update bart example docs
2020-03-12 10:36:37 -04:00
72768b6b9c [model_cards] polbert: simplify usage example with pipelines
Co-Authored-By: Darek Kłeczek <darek.kleczek@gmail.com>
2020-03-12 10:05:40 -04:00
a4c75f1492 [ci] last resort 2020-03-11 19:11:19 -04:00
824e320d96 [ci] Fixup c6cf925 2020-03-11 18:52:10 -04:00
c6cf925ff8 [ci] last resort
while looking for fix to https://twitter.com/julien_c/status/1237864185821708291
2020-03-11 18:49:19 -04:00
14e455b716 [model_cards] 🇹🇷 Add new (cased) DistilBERTurk model 2020-03-11 18:40:38 -04:00
f65f74bbce Create README.md (#3230) 2020-03-11 12:37:00 -04:00
324292cfc7 Add Bio+ Clinical BERT model card (#3229)
* Create README.md

* Update README.md
2020-03-11 12:36:33 -04:00
e43afb1bb8 [model_cards] DialoGPT: How to use + thumbnail + conversational tag
cc @dreasysnail

Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
2020-03-11 11:36:47 -04:00
5085df995f [model_cards] PolBERT tweaks 2020-03-11 09:29:22 -04:00
19a63d8245 Create Readme.md model card (#3221) 2020-03-11 09:12:48 -04:00
dc848c2994 Create README.md 2020-03-11 09:10:57 -04:00
6ad221daf3 Create README.md 2020-03-11 09:10:23 -04:00
735180aa14 Create README.md 2020-03-11 09:10:13 -04:00
6c61c0801e Create README.md 2020-03-11 09:03:30 -04:00
235616686a Update README.md
- Update title
- Remove metrics
2020-03-11 09:03:20 -04:00
5bb00c817f Update README.md
Change title to clarify the model description
2020-03-11 09:03:07 -04:00
601e424750 Update README.md 2020-03-11 09:02:56 -04:00
1b9e765b21 Update README.md
- Remove metrics until tested on other xquad benchmarks
2020-03-11 09:02:18 -04:00
db29ffc978 Merge pull request #3140 from patrickvonplaten/merge_bart_generate_into_default_generate
Merge bart generate into default generate
2020-03-11 13:21:53 +01:00
ac303eae46 fix problem with half 2020-03-11 12:24:30 +01:00
bc9d5d917c make all tensors half precision 2020-03-11 12:15:38 +01:00
a332cc9f7f finalize generation merge 2020-03-11 11:53:36 +01:00
1ba21f96ca fix bug in tf no_repeat_ngram_size 2020-03-11 11:06:56 +01:00
d997ac7810 fix typo 2020-03-11 11:06:56 +01:00
7351a8dbaf re-add scoring filtering 2020-03-11 11:06:56 +01:00
9b8ee8cea0 delete print and make style 2020-03-11 11:06:56 +01:00
ca1330f0b2 do not mess with the negative sign 2020-03-11 11:06:56 +01:00
10989715d0 rename variable 2020-03-11 11:06:56 +01:00
cf06290565 remove ipdb 2020-03-11 11:06:56 +01:00
374deef48d fixed typo 2020-03-11 11:06:56 +01:00
a2c8e516c2 fix torch to tf translation 2020-03-11 11:06:56 +01:00
ca2047bc35 refactor variable naming and improve tf generate in line with torch generate 2020-03-11 11:06:56 +01:00
41b437ea3a add draft version of propsoed changes for ROGUE score 2020-03-11 11:06:56 +01:00
a5751f7578 fix bug with attention_mask as optional input argument 2020-03-11 11:06:56 +01:00
629aac92ec do not allow do_sample and weird force bos token things 2020-03-11 11:06:56 +01:00
d880a5fbde finalized PR 2020-03-11 11:06:56 +01:00
2acfe63964 best current version and make style 2020-03-11 11:06:56 +01:00
c62444da39 fix conflicts 2020-03-11 11:06:56 +01:00
77e6775065 add current changes 2020-03-11 11:06:56 +01:00
333affcb81 add current changes 2020-03-11 11:06:56 +01:00
421216997b comment out stuff 2020-03-11 11:06:56 +01:00
7a11e925cf work in progress 2020-03-11 11:06:56 +01:00
5b3000d933 renamed min_len to min_length 2020-03-11 11:06:56 +01:00
aceb3fbaf4 only do output_past=True for language generation in bart 2020-03-11 11:06:56 +01:00
7cba11fb9b better naming 2020-03-11 11:06:56 +01:00
ff648221bd fix conflicts 2020-03-11 11:06:56 +01:00
c0d9dd3ba9 refactored code a bit and made more generic 2020-03-11 11:06:56 +01:00
d8e2b3c547 fix conflicts 2020-03-11 11:06:56 +01:00
d6de6423ba [doc] --organization tweak
Co-Authored-By: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-03-10 16:52:44 -04:00
0e56dc3078 [doc] Document the new --organization flag of CLI 2020-03-10 16:42:01 -04:00
270dfa1c8e [dialogpt] conversion script
Reference: https://github.com/huggingface/transformers/pull/1778#issuecomment-567675530

cc @patrickvonplaten and @dreasysnail
2020-03-10 15:09:29 -04:00
2661d80687 Update README.md
- Clarify that the model is not trained on the evaluation dataset
2020-03-10 10:59:34 -04:00
6a13448ad2 Update README.md
- Fix path of tokenizer
- Clarify that the model is not trained on the evaluation set
2020-03-10 10:59:16 -04:00
e57533cca5 Create README.md 2020-03-10 10:58:13 -04:00
31f2437f07 Merge pull request #3191 from patrickvonplaten/add_integration_tests_lm_generate_torch_tf
Add integration tests lm generate torch tf
2020-03-10 11:29:17 +01:00
5ca356a464 NER - pl example (#3180)
* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps

* pl checkpoint callback filenotfound error: make directory and pass

* #3159 pl checkpoint path difference

* 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus

* Updated results in readme

* 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts

* comment length check

* using deprecated validation_end for stable results

* style related changes
2020-03-09 20:43:38 -04:00
f51ba059b9 Model card for albert-base-v2-squad2 2020-03-09 19:37:15 -04:00
cbf8f5d32b [model upload] Support for organizations 2020-03-09 17:33:57 -04:00
525b6b1c54 TFQA pipeline marked as slow test 2020-03-09 16:52:30 -04:00
3aca02efb3 Bart example: model.to(device) (#3194) 2020-03-09 15:09:35 -04:00
5164ea91a7 Skipping outputs (#3116)
* Minimal example

* Proposal 2

* Proposal 2 for fast tokenizers

* Typings

* Docs

* Revert "Docs" for easier review

This reverts commit eaf0f97062e809887704a542144c537f769d5223.

* Remove unnecessary assignments

* Tests

* Fix faulty type

* Remove prints

* return_outputs -> model_input_names

* Revert "Revert "Docs" for easier review"

This reverts commit 6fdc69408102bf695797f2dfddbb6350c6b9e722.

* code quality
2020-03-09 13:48:58 -04:00
49debe62fd Merge pull request #3190 from patrickvonplaten/fix_repetition_penalty_in_tf_generate
fix repetition penalty mask in tf
2020-03-09 16:29:57 +01:00
847d370301 fix typo 2020-03-09 16:18:29 +01:00
eb3e6cb04f cased -> uncased in BERT SQuAD example
closes #3183
2020-03-09 10:54:18 -04:00
9050ffe035 delete w! -> need to be more careful with vim 2020-03-09 15:43:12 +01:00
efb619235c add print statement to avoid code quality problem 2020-03-09 15:31:21 +01:00
b12541c4dc test ctrl 2020-03-09 13:58:01 +00:00
3e624c64ca fix repetition penalty mask in tf 2020-03-09 14:55:11 +01:00
b73dd1a0e4 fix typo in test xlm tf 2020-03-09 11:34:31 +01:00
4620caa864 fix if use lang embeddings in tf xlm 2020-03-09 11:18:54 +01:00
fbd02d4693 fixed all tests, still need to check ctrl tf and pt and xlm tf 2020-03-08 21:45:55 +01:00
b4a3a64744 fix xlnet & transfotests 2020-03-08 16:25:03 +01:00
b29fed790b Updated Tokenw ise in print statement to Token wise 2020-03-08 10:55:30 -04:00
66c827656f fix typo in test gpt2 2020-03-08 15:35:08 +01:00
314bdc7c14 fix typo in test 2020-03-08 15:34:20 +01:00
575976144a updated all tests 2020-03-08 15:29:10 +01:00
e03129ad44 [model_cards] Small formatting fix
cc @mrm8488
2020-03-06 18:07:44 -05:00
08a70fb392 [model_cards] Fixup d6df9a8f 2020-03-06 17:50:38 -05:00
0ae91c80aa Change back pipeline signatures (#3105)
* Change back pipeline signatures

* String types for non-imported objects
2020-03-06 17:26:18 -05:00
d6df9a8ffe [model_cards]Add albert chinese model(tiny,small,base,large,xlarge,xxlarge) 2020-03-06 17:23:31 -05:00
c52716d46c Create README.md 2020-03-06 17:20:19 -05:00
73a0c25376 remove excess line breaks in DeepPavlov model cards 2020-03-06 17:19:35 -05:00
ed37f9fa4f [Bart] _prepare_decoder_inputs should use large negative (#3158) 2020-03-06 16:06:36 -05:00
0416d437fb Merge pull request #3148 from patrickvonplaten/refactoring_beam_search_for_tf_2
refactored beam search according to torch implementation
2020-03-06 22:01:46 +01:00
db9279dedb Fix QA models binding for Flaubert, XLNet and XLM. (#3100)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Format & quality

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Again.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-06 13:04:29 -05:00
e58b3ec5df add imports to examples (#3160) 2020-03-06 11:15:33 -05:00
6ffe03a0a1 Merge pull request #3137 from tomhosking/bart-refactor
Refactor BartModel so that input checks are handled within enc/dec
2020-03-06 13:06:34 +01:00
3e5da38dae Merge pull request #3132 from huggingface/hf_api_model_list
[hf_api] Get the public list of all the models on huggingface
2020-03-06 13:05:52 +01:00
9499a3778e Merge pull request #3103 from gthb/keras-serialization
Support keras JSON/HDF5 serialization of main layers
2020-03-06 12:59:13 +01:00
9362eb4a07 refactored beam search according to torch implementation 2020-03-06 00:46:29 +01:00
c8035e11e8 Merge pull request #3149 from patrickvonplaten/fix_renaming_error
fix missed BartForMaskedLM renaming
2020-03-06 00:45:08 +01:00
58fc8f97a3 fix renaming problem 2020-03-06 00:35:47 +01:00
857e0a0d3b Rename BartForMaskedLM -> BartForConditionalGeneration (#3114)
* improved documentation
2020-03-05 17:41:18 -05:00
fa2aa699da Merge pull request #3011 from patrickvonplaten/add_models_special_tokens_to_specific_configs
Add models special tokens to its pretrained configs
2020-03-05 17:26:48 -05:00
146c521235 Merge branch 'master' into add_models_special_tokens_to_specific_configs 2020-03-05 17:24:42 -05:00
b623ddc000 Pass kwargs to configuration (#3147)
* Pass kwargs to configuration

* Setter

* test
2020-03-05 17:16:57 -05:00
0001d05686 Correct missing keys + test (#3143) 2020-03-05 17:01:54 -05:00
1741d740f2 Merge pull request #3145 from sshleifer/bartfp16
[Bart] FP16 Support
2020-03-05 22:14:35 +01:00
bbabbc1613 Merge pull request #3135 from patrickvonplaten/refactor_beam_search_generate
Refactoring and bug fixing beam search generate
2020-03-05 22:12:56 +01:00
14d40584b2 remove newline 2020-03-05 13:06:35 -05:00
1360dacaa3 cleanup deltas 2020-03-05 12:57:42 -05:00
810079de1f no ipdb 2020-03-05 12:48:14 -05:00
c203509d5b undo chg 2020-03-05 12:34:08 -05:00
c36fdc88d4 tests pass 2020-03-05 12:33:08 -05:00
7ac47bfe69 Updated notebook dependencies for Colab.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 16:07:51 +01:00
be02176a4b Fixing sentiment pipeline in 03-pipelines notebook.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 16:07:51 +01:00
8a2d9bc9ef Add model cards for DeepPavlov models (#3138)
* add empty model cards for every current DeepPavlov model

* fix: replace cyrillic `с` with `c`

* docs: add model cards for current DeepPavlov BERT models

* docs: add links for arXiv preprints
2020-03-05 09:34:43 -05:00
012cbdb0f5 Updating colab links in notebooks README.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 15:34:15 +01:00
31acb8dc52 Remove rogue .DS_Store 2020-03-05 13:51:30 +00:00
06a6cb6f36 Refactor BartModel so that input checks are handled within BartEncoder and BartDecoder 2020-03-05 13:45:41 +00:00
e33ed12c3b uncomment expression 2020-03-05 13:41:04 +01:00
4220fd52b9 remove ipdb 2020-03-05 13:36:21 +01:00
c47394b0c9 refactoring and bug fixing beam search generate 2020-03-05 13:12:50 +01:00
4c91a3af94 Document keras_serializable decorator 2020-03-05 11:48:10 +00:00
4be01e5cbf Use name transformers_config in Keras serialization
Be explicit that this is config for the transformers package (as these
layers may coexist with other custom stuff in a Keras model, plus the
Keras container itself is called config, and config["config"] is not
great)

Add explicit error handling for initializer calls that have neither
the `config` nor the `transformers_config` argument, or have both.
2020-03-05 11:47:35 +00:00
a355f4f0fc Add functools.wraps for wrapper initializer
Preserve the original initializer function's metadata. See
https://docs.python.org/3/library/functools.html#functools.update_wrapper
2020-03-05 11:18:50 +00:00
d262a5d48e fix: remove unused import 2020-03-05 11:05:29 +00:00
30624f7056 Fix Colab links + install dependencies first.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-05 11:40:15 +01:00
3f067f4409 [hf_api] slightly more doc 2020-03-04 23:55:46 -05:00
f564f93c84 [hf_api] Get the public list of all the models on huggingface 2020-03-04 23:33:09 -05:00
ff9e79ba3a make style 2020-03-04 20:18:07 -05:00
07a79db505 Fix failing doc samples 2020-03-04 19:11:31 -05:00
4f338ed407 Explicit config_class instead of module inspection 2020-03-04 23:45:29 +00:00
6fe1cc0874 fix: clean up inadvertent change in tf_t5
This was the beginnings of an attempt to address the test failure on
this layer, and instead I backed out of making this layer
keras-serializable at all ... so it was a mistake to commit this.
2020-03-04 23:24:15 +00:00
bdd3d0c76d Merge pull request #3118 from patrickvonplaten/add_beam_search_to_generation_tf_2_0
Add beam search to generation tf 2 0
2020-03-04 23:28:00 +01:00
c440030e99 [model_cards] Tag AR model languages 2020-03-04 16:33:10 -05:00
3b7f95a506 Merge pull request #3115 from gthb/fix-bogus-param-to-layer-init
fix: passing config as Layer trainable param
2020-03-04 21:59:09 +01:00
1bca97ec7f Update notebook link and fix few working issues.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-03-04 21:19:33 +01:00
189113d891 Create README.md 2020-03-04 13:57:23 -05:00
76111a3d3a [model_cards] Add card by @lvwerra
(the current way to submit a model card to have it displayed on the website is to open a PR on the `transformers` repo itself)

Thanks for sharing!
2020-03-04 12:55:20 -05:00
a43c388abb [model_cards] Add card by @djstrong
(the current way to submit a model card to have it displayed on the website is to open a PR on the `transformers` repo itself)

Thanks for sharing!
2020-03-04 12:53:02 -05:00
ec60e0ae7a Create README.md 2020-03-04 12:06:05 -05:00
6a143bf282 model cards for both aubmindlab/bert-base-arabert models (#3113)
* Added readme for AraBERTv0.1

* Added readme to AraBERT
2020-03-04 12:04:39 -05:00
932eab943d include tf gpt2 tests for attn mask and past variable (#3122) 2020-03-04 12:03:46 -05:00
256cbbc4a2 [doc] Fix link to how-to-train Colab 2020-03-04 12:01:45 -05:00
006097f8ad rename variables named 'word' to 'token' in generate fn (#3119)
* fix conflits

* fixed naming bug

* make style
2020-03-04 12:01:17 -05:00
18f4b9274f fix: work with Tensorflow < 2.1.0
tf.keras.utils.register_keras_serializable was added in TF 2.1.0, so
don't rely on it being there; just decorate the class with it if it
exists.
2020-03-04 16:57:29 +00:00
71c8711970 Adding Docker images for transformers + notebooks (#3051)
* Added transformers-pytorch-cpu and gpu Docker images

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added automatic jupyter launch for Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move image from alpine to Ubuntu to align with NVidia container images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added TRANSFORMERS_VERSION argument to Dockerfile.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Pytorch-GPU based Docker image

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Tensorflow images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use python 3.7 as Tensorflow doesnt provide 3.8 compatible wheel.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove double FROM instructions on transformers-pytorch-cpu image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-tensorflow-gpu Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* use the correct ubuntu version for tensorflow-gpu

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pipelines example notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-cpu and transformers-gpu (including both PyTorch and TensorFlow) images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Docker images doesnt start jupyter notebook by default.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Tokenizers notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update images links

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update Docker images to python 3.7.6 and transformers 2.5.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added 02-transformers notebook.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trying to realign 02-transformers notebook ?

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Transformer image schema

* Some tweaks on tokenizers notebook

* Removed old notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Attempt to provide table of content for each notebooks

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Second attempt.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reintroduce transformer image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Keep trying

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* It's going to fly !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remaining of the Table of Content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix inlined elements for the table of content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed anaconda dependencies for Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removing notebooks ToC

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added LABEL to each docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed old Dockerfile

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Directly use the context and include transformers from here.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce overall size of compiled Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install jupyter by default and use CMD for easier launching of the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce number of layers in the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added README.md for notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix notebooks link in README

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some wording issues.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added blog notebooks too.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing spelling errors in review comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
2020-03-04 11:45:57 -05:00
7a89a3e493 correct beam search sampling 2020-03-04 17:27:47 +01:00
c4c4c9998a make GPT2 and CTRL shape consistent between torch and TF 2020-03-04 17:27:47 +01:00
2529b2d37e set redorder past sort dimension to its default 2020-03-04 17:27:47 +01:00
61fef6e957 added beam_search generation for tf 2.0 2020-03-04 17:27:47 +01:00
34de670dbe fix sklearn release circle ci [temporary] (#3123) 2020-03-04 11:25:23 -05:00
6701fb7859 fix beam_search behavior when sampling (#3106)
* fix beam_search behavior when sampling

* delete print

* make correct style
2020-03-04 09:30:51 -05:00
b1116fd673 fix: passing config as Layer trainable param
Lurking bugs discovered while working on other stuff.
2020-03-03 23:05:40 +00:00
96c4990165 fix unused imports and style 2020-03-03 22:57:05 +00:00
470753bcf5 Put @keras_serializable only on layers it works on
And only run the test on TF*MainLayer classes so marked.
2020-03-03 22:44:45 +00:00
0c716ede8c Use class decorator instead of superclass
When supplied by Keras deserialization, the config parameter to initializers
will be a dict. So intercept it and convert to PretrainedConfig object (and
store in instance attribute for get_config to get at it) before passing to the
actual initializer. To accomplish this, and repeat as little code as possible,
use a class decorator on TF*MainLayer classes.
2020-03-03 22:31:42 +00:00
e9e6efdc45 BartForSequenceClassification: fix num_labels, add test (#3110) 2020-03-03 15:54:29 -05:00
f631e01d2c [ci] Re-run integration ground truth from fairseq
Adopted best practice set by @patrickvonplaten of commenting lines run on fairseq, for easy comparison

also see #3020
2020-03-03 15:31:40 -05:00
5b396457e5 Summarization Examples: add Bart CNN Evaluation (#3082)
* Rename and improve example

* Add test

* slightly faster test

* style

* This breaks remy prolly

* shorter test string

* no slow

* newdir structure

* New tree

* Style

* shorter

* docs

* clean

* Attempt future import

* more import hax
2020-03-03 15:29:59 -05:00
5c5af879b6 [Bart] dont call .forward (#3094) 2020-03-03 15:14:12 -05:00
b8da16f390 Add (failing) tests for Keras save/load 2020-03-03 15:22:34 +00:00
ba28170717 Support keras JSON/HDF5 serialization of main layers
Fixes #3101
2020-03-03 15:21:41 +00:00
a088d75e51 [model_cards] Fix incorrect path 2020-03-03 09:52:32 -05:00
4134100363 Add generate() functionality to TF 2.0 (#3063)
* add first copy past test to tf 2 generate

* add tf top_k_top_p_filter fn

* add generate function for TF

* add generate function for TF

* implemented generate for all models expect transfoXL

* implemented generate for all models expect transfoXL

* implemented generate for all models expect transfoXL

* make style

* change permission of test file to correct ones

* delete ipdb

* delete ipdb

* fix bug and finish simple gpt2 integration test

* clean test file

* clean test file

* make style

* make style

* make style

* make style

* change import style

* change import style

* make style

* make style

* add decorators

* add decorators

* fix tf ctrl bug dim => axis in TF

* make style

* make style

* refactored test file

* refactored test file

* take out test_torch_tf_conversion if nothing is defined

* take out test_torch_tf_conversion if nothing is defined

* remove useless files

* remove useless files

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* solve conflicts

* solve conflicts

* fix conflicts

* fix conflicts

* merge conflicts

* delete ipdb

* exposed top_k_top_p_filtering fns

* delete weirdly created w! file

* add comment to test tf common modeling

* fix conflicts

* fix conflicts

* make style

* merge conflicts

* make style

* change tf.tensor.shape to shape_list(tensor)
2020-03-03 09:42:15 -05:00
b31f715019 bert-base-arabic model card 2020-03-03 09:29:28 -05:00
c0c7ec3458 Don't crash if fine-tuned model doesn't end with a number (#3099)
That's the same fix applied in https://github.com/huggingface/transformers/issues/2258 , but for the GLUE example
2020-03-03 08:59:47 -05:00
eec5ec8071 [BART] to each its own config + make BART compatible w/ Pipelines
cc @sshleifer
2020-03-02 18:56:17 -05:00
6b1558bad8 add models cards for camembert-base-fquad camembert-base-squad (#3089)
* add models cards for camembert-base-fquad camembert-base-squad

* typo fix
2020-03-02 17:07:13 -05:00
f169957d0c TF GPU CI (#3085)
* debug env

* Restrict TF GPU memory

* Fixup

* One more test

* rm debug logs

* Fixup
2020-03-02 15:45:25 -05:00
d3eb7d23a4 Pipeline doc (#3055)
* Pipeline doc initial commit

* pipeline abstraction

* Remove modelcard argument from pipeline

* Task-specific pipelines can be instantiated with no model or tokenizer

* All pipelines doc
2020-03-02 14:07:10 -05:00
2c7749784c Update README.md
- Add example of usage
- Update metrics
2020-03-02 13:35:34 -05:00
0e56b37e80 rm bogus file
cc @patrickvonplaten
2020-03-02 12:27:12 -05:00
2fdc7f6ce8 correct greedy generation when doing beam search (#3078)
* correct greedy generation when doing beam search

* improve comment
2020-03-02 12:00:09 -05:00
13afb71208 [ci] Ensure that TF does not preempt all GPU memory for itself
see https://www.tensorflow.org/guide/gpu#limiting_gpu_memory_growth

Co-Authored-By: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-03-02 11:56:45 -05:00
c0135194eb Force pad_token_id to be set before padding for standard tokenizer (#3035)
* force pad_token_id to be set before padding

* fix tests and forbid padding without having a padding_token_id set
2020-03-02 10:53:55 -05:00
b54ef78d0c Bart-CNN (#3059)
`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.
2020-03-02 10:35:53 -05:00
6b1ff25084 fix n_gpu count when no_cuda flag is activated (#3077)
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
2020-03-02 10:20:21 -05:00
298bed16a8 make style 2020-03-01 14:08:01 -05:00
852e032ca6 include roberta in run_squad_w_distillation - cc @graviraja 2020-03-01 01:56:50 +00:00
b5509abb36 --do_lower_case will always trick me... 2020-03-01 01:39:24 +00:00
d6ef587a10 [ci] Fixup e36bd94345af6045108a391f9ac7f4dc557548de 2020-02-28 23:19:17 -05:00
e36bd94345 [ci] Run all tests on (self-hosted) GPU (#3020)
* Create self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* Update self-hosted.yml

* do not run slow tests, for now

* [ci] For comparison with circleci, let's also run CPU-tests

* [ci] reorganize

* clearer filenames

* [ci] Final tweaks before merging

* rm slow tests on circle ci

* Trigger CI

* On GPU this concurrency was way too high
2020-02-28 21:11:08 -05:00
908fa43b54 Changes to NER examples for PLT and TPU (#3053)
* changes to allow for tpu training

* black

* tpu

* tpu
2020-02-27 16:45:32 -05:00
8bcb37bfb8 NER support for Albert in run_ner.py and NerPipeline (#2983)
* * Added support for Albert when fine-tuning for NER

* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens

* Added ,

* Now passes style guide enforcement

* Changes from reviews.

* Code now passes style enforcement

* Added test for AlbertForTokenClassification

* Added test for AlbertForTokenClassification
2020-02-27 10:22:55 -05:00
6a37588041 spelling: strictly (#3042) 2020-02-27 10:22:35 -05:00
f4ff44a6d9 Fix batch_encode_plus (#3041) 2020-02-27 09:56:47 -05:00
f71157529e Added test for AlbertForTokenClassification 2020-02-27 12:24:20 +01:00
aceb6a0907 Added test for AlbertForTokenClassification 2020-02-27 11:52:46 +01:00
d762d4289c Code now passes style enforcement 2020-02-26 23:50:40 +01:00
9495d38b0d Changes from reviews. 2020-02-26 23:36:39 +01:00
b370cc7e99 [gpu] Fixup fdd61b19928e87a5354c36923182e801bfedb31b 2020-02-26 21:48:49 +00:00
f5516805c2 Fix bart slow test 2020-02-26 20:47:49 +00:00
5bc99e7f33 fix several typos in Distil* readme (#3034) 2020-02-26 12:39:54 -05:00
fdd61b1992 Fix attn mask gpt2 when using past (#3033)
* fix issue and add some tests

* fix issue and add some tests

* updated doc string gpt2
2020-02-26 12:04:37 -05:00
9cda3620b6 Fix (non-slow) tests on GPU (torch) (#3024)
* Fix tests on GPU (torch)

* Fix bart slow tests

Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-02-26 11:59:25 -05:00
9df74b8bc4 Delete all mentions of Model2Model (#3019) 2020-02-26 11:36:27 -05:00
bb7c468520 Documentation (#2989)
* All Tokenizers

BertTokenizer + few fixes
RobertaTokenizer
OpenAIGPTTokenizer + Fixes
GPT2Tokenizer + fixes
TransfoXLTokenizer
Correct rst for TransformerXL
XLMTokenizer + fixes
XLNet Tokenizer + Style
DistilBERT + Fix XLNet RST
CTRLTokenizer
CamemBERT Tokenizer
FlaubertTokenizer
XLMRobertaTokenizer
cleanup

* cleanup
2020-02-25 18:43:36 -05:00
c913eb9c38 Add integration tests for xlm roberta modelling and xlm roberta tokenzier (#3014)
* add first files

* add xlm roberta integration tests

* make style

* flake 8 issues solved
2020-02-25 16:51:25 -05:00
e8ce63ff21 Change masking to direct labeling for TPU support. (#2982)
* change masking to direct labelings

* fix black

* switch to ignore index

* .

* fix black
2020-02-25 14:47:43 -05:00
7a7ee28cb9 missing ner link (#2967) 2020-02-25 14:06:57 -05:00
65e7c90a77 Adding usage examples for common tasks (#2850)
* Usage: Sequence Classification & Question Answering

* Pipeline example

* Language modeling

* TensorFlow code for Sequence classification

* Custom TF/PT toggler in docs

* QA + LM for TensorFlow

* Finish Usage for both PyTorch and TensorFlow

* Addressing Julien's comments

* More assertive

* cleanup

* Favicon
- added favicon option in conf.py along with the favicon image
- udpated 🤗 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)

Co-authored-by: joshchagani <joshua@joshuachagani.com>
2020-02-25 13:48:24 -05:00
f5b50c6b8e make style 2020-02-25 16:41:54 +01:00
ec16142ee5 add special tokens to pretrain configs of respective lm head models 2020-02-25 16:37:59 +01:00
e645dcbb70 add special tokens to pretrain configs of respective lm head models 2020-02-25 16:37:56 +01:00
e693cd1e87 [ci] Run slow tests every day 2020-02-24 19:54:47 -05:00
4fc63151af [ci] Attempt to fix #2844 2020-02-24 19:51:34 -05:00
b90745c590 Test correct tokenizers after default switch (#3003) 2020-02-24 18:45:53 -05:00
3716c3d8af False by default (#3002) 2020-02-24 18:30:57 -05:00
f9ec5ca90b Release: v2.5.1 2020-02-24 18:22:54 -05:00
4cd9c0971c Fix for fast tokenizers save_pretrained compatibility with Python. (#2933)
* Renamed file generate by tokenizers when calling save_pretrained to match python.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added save_vocabulary tests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove python quick and dirty fix for clean Rust impl.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.5.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added some save_pretrained / from_pretrained unittests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update tokenizers to 0.5.2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality and format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* flake8

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Making sure there is really a bug in unittest

* Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-24 18:20:42 -05:00
ee60840ee6 fix _update_memory fn call in transformer-xl (#2971) 2020-02-24 17:50:24 -05:00
6a50d501ec add explaining example to XLNet LM modeling (#2997)
* add explaining example to XLNet LM modeling

* improve docstring for xlnet
2020-02-24 15:42:38 -05:00
65d74c4965 Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987)
* add preprocessing to add space before punctuation for transfo_xl

* improve warning messages

* make style

* compile regex at instantination of tokenizer object
2020-02-24 15:11:10 -05:00
a143d9479e Add local_files_only parameter to pretrained items (#2930)
* Add disable_outgoing to pretrained items

Setting disable_outgoing=True disables outgonig traffic:
- etags are not looked up
- models are not downloaded

* parameter name change

* Remove forgotten print
2020-02-24 14:58:15 -05:00
286d1ec746 Create README.md 2020-02-24 14:33:49 -05:00
7984a70ee4 kwargs are passed to both model and configuration in AutoModels (#2998) 2020-02-24 14:19:39 -05:00
21d8b6a33e Testing that batch_encode_plus is the same as encode_plus (#2973)
* Testing that encode_plus and batch_encode_plus behave the same way

Spoiler alert: they don't

* Testing rest of arguments in batch_encode_plus

* Test tensor return in batch_encode_plus

* Addressing Sam's comments

* flake8

* Simplified with `num_added_tokens`
2020-02-24 12:09:46 -05:00
17c45c39ed Add slow generate tests for pretrained lm models (#2909)
* add slow generate lm_model tests

* fix conflicts

* merge conflicts

* fix conflicts

* add slow generate lm_model tests

* make style

* delete unused variable

* fix conflicts

* fix conflicts

* fix conflicts

* delete unused variable

* fix conflicts

* finished hard coded tests
2020-02-24 11:51:57 -05:00
8194df8e0c Warning on add_special_tokens (#2966)
Warning on `add_special_tokens` when passed to `encode`, `encode_plus` and `batch_encode_plus`
2020-02-24 08:42:54 -05:00
38f5fe9e02 add_ctags_to_git_ignore (#2984) 2020-02-23 16:55:32 -05:00
105dcb4162 Now passes style guide enforcement 2020-02-23 21:47:59 +01:00
33eb8a165d Added , 2020-02-23 21:43:31 +01:00
869b66f6b3 * Added support for Albert when fine-tuning for NER
* Added support for Albert in NER pipeline

* Added command-line options to examples/ner/run_ner.py to better control tokenization

* Added class AlbertForTokenClassification

* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
2020-02-23 21:13:03 +01:00
129f0604ac Delete untested, broken Model2LSTM (#2968) 2020-02-23 11:28:48 -05:00
0e84559d64 Correct special_tokens_mask when add_special_tokens=False (#2965)
Don't know of a use case where that would be useful, but this is more consistent
2020-02-23 09:50:39 -05:00
92487a1dc0 Bart: fix layerdrop and cached decoder_input_ids for generation (#2969) 2020-02-22 16:25:04 -05:00
c36416e53c Add standardized get_vocab method to tokenizers 2020-02-22 12:09:01 -05:00
cafc4dfc7c fix hardcoded path in examples readme 2020-02-22 11:12:38 -05:00
34b4b5a9ed Update modelcard of bert-base-german-cased
Add image
2020-02-22 11:08:42 -05:00
7df12d7bf8 Update README.md
- I added an example using the model with pipelines to show that we have set```{"use_fast": False}``` in the tokenizer.
- I added a Colab to play with the model and pipelines
- I added a Colab to discover Huggingface pipelines at the end of the document
2020-02-22 11:06:41 -05:00
cc6775cdf5 Fix max_length not taken into account when using pad_to_max_length on fast tokenizers (#2961)
* enable_padding should pad up to max_length if set.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added more testing on padding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-22 09:27:47 -05:00
94ff2d6ee8 Remove double bias (#2958) 2020-02-21 17:10:18 -05:00
b5b3445c4f Only use F.gelu for torch >=1.4.0 (#2955)
* Only use F.gelu for torch >=1.4.0

* Use F.gelu for newer torch
2020-02-21 16:10:21 -05:00
fc38d4c86f Improve special_token_id logic in run_generation.py and add tests (#2885)
* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* changed fast random lm generation testing design to more general one

* delete in old testing design in gpt2

* correct old variable name

* temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed

* adapted all fast random generate tests to new design

* better warning description in modeling_utils

* better comment

* better comment and error message

Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
2020-02-21 12:09:59 -05:00
c749a543fa Added CamembertForQuestionAnswering (#2746)
* Added CamembertForQuestionAnswering

* fixed camembert tokenizer case
2020-02-21 12:01:02 -05:00
5211d333bb Update modeling_tf_utils.py (#2924)
Tensorflow does not use .eval() vs .train().

closes https://github.com/huggingface/transformers/issues/2906
2020-02-21 11:28:32 -05:00
3e98f27e4a Create README.md for xlnet_large_squad (#2942) 2020-02-21 08:54:41 -05:00
4452b44b90 Labels are now added to model config under id2label and label2id (#2945) 2020-02-21 08:53:05 -05:00
53ce3854a1 New BartModel (#2745)
* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs
2020-02-20 18:11:13 -05:00
564fd75d65 Removed unused fields in DistilBert TransformerBlock (#2710)
* Removed unused fields in DistilBert TransformerBlock
2020-02-20 16:08:21 -05:00
889d3bfdbb default arg fix (#2937) 2020-02-20 15:31:17 -05:00
197d74f988 Add get_vocab method to PretrainedTokenizer 2020-02-20 15:26:49 -05:00
ea8eba35e2 Fix InputExample docstring (#2891) 2020-02-20 15:25:15 -05:00
e2a6445ebb Tokenizer fast warnings (#2922)
* Remove warning when pad_to_max_length is not set.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move RoberTa warning to RoberTa and not GPT2 base tokenizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 11:55:03 -05:00
9b3093311f Expose all constructor parameter for BertTokenizerFast (#2921)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 11:53:32 -05:00
b662f0e625 Support for torch-lightning in NER examples (#2890)
* initial pytorch lightning commit

* tested multigpu

* Fix learning rate schedule

* black formatting

* fix flake8

* isort

* isort

* .

Co-authored-by: Check your git settings! <chris@chris-laptop>
2020-02-20 11:50:05 -05:00
ab1238393c Update to include example of LM
The model files have been updated in order to include the classification layers, based on https://github.com/huggingface/transformers/issues/2901, and now can be also used as a LM.
2020-02-20 10:57:59 -05:00
976e9afece Add syntax highlighting to the BibTeX in README 2020-02-20 10:06:15 -05:00
cbc5705541 Fix spell: EsperBERTo, not EspertBERTo 2020-02-20 10:02:07 -05:00
d490b5d500 Fast Tokenizers save pretrained should return the list of generated file paths. (#2918)
* Correctly return the tuple of generated file(s) when calling save_pretrained

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality and format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-20 00:58:04 +01:00
2708b44ee9 Patch ALBERT with heads in TensorFlow 2020-02-19 18:46:25 -05:00
1abd53b1aa Patch ALBERT with heads in TensorFlow 2020-02-19 18:24:40 -05:00
e676764241 Override build_inputs_with_special_tokens for fast tokenizers (#2912)
* Override build_inputs_with_special_tokens for fast impl + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Quality + format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-19 16:09:51 -05:00
59c23ad9c9 README link + better instructions for release 2020-02-19 11:57:17 -05:00
22b2b5790e Documentation v2.5.0 2020-02-19 11:53:30 -05:00
fb560dcb07 Release: v2.5.0
Welcome Rust Tokenizers
2020-02-19 11:46:19 -05:00
3f3fa7f7da Integrate fast tokenizers library inside transformers (#2674)
* Implemented fast version of tokenizers

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bumped tokenizers version requirements to latest 0.2.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added matching tests

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching OpenAI GPT tokenization !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching GPT2 on tokenizers

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Expose add_prefix_space as constructor parameter for GPT2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Matching Roberta tokenization !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed fast implementation of CTRL.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Binding TransformerXL tokenizers to Rust.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Updating tests accordingly.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added tokenizers as top-level modules.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Black & isort.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Rename LookupTable to WordLevel to match Rust side.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Black.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use "fast" suffix instead of "ru" for rust tokenizers implementations.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Introduce tokenize() method on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* encode_plus dispatchs to batch_encode_plus

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* batch_encode_plus now dispatchs to encode if there is only one input element.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bind all the encode_plus parameter to the forwarded batch_encode_plus call.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizers dependency to 0.3.0

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Formatting.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix tokenization_auto with support for new (python, fast) mapping schema.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Give correct fixtures path in test_tokenization_fast.py for the CLI.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Expose max_len_ properties on BertTokenizerFast

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move max_len_ properties to PreTrainedTokenizerFast and override in specific subclasses.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* _convert_encoding should keep the batch axis tensor if only one sample in the batch.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning message for RobertaTokenizerFast if used for MLM.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added use_fast (bool) parameter on AutoTokenizer.from_pretrained().

This allows to easily enable/disable Rust-based tokenizer instantiation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Let's tokenizers handle all the truncation and padding stuff.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Allow to provide tokenizer arguments during pipeline creation.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update test_fill_mask pipeline to not use fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix too much parameters for convert_encoding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* When enabling padding, max_length should be set to None.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Avoid returning nested tensors of length 1 when calling encode_plus

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure output is padded when return_tensor is not None.

Tensor creation requires the inital list input to be of the exact same size.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Disable transfoxl unittest if pytorch is not available (required to load the model)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* encode_plus should not remove the leading batch axis if return_tensor is set

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Temporary disable fast tokenizers on QA pipelines.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix formatting issues.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update tokenizers to 0.4.0

* Update style

* Enable truncation + stride unit test on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add unittest ensuring special_tokens set match between Python and Rust.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure special_tokens are correctly set during construction.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Give more warning feedback to the user in case of padding without pad_token.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* quality & format.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added possibility to add a single token as str

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added unittest for add_tokens and add_special_tokens on fast tokenizers.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix rebase mismatch on pipelines qa default model.

QA requires cased input while the tokenizers would be uncased.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Using offset mapping relative to the original string + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: save_vocabulary requires folder and file name

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Simplify import for Bert.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: truncate_and_pad disables padding according to the same heuristic than the one enabling padding.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Remove private member access in tokenize()

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Bump tokenizers dependency to 0.4.2

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* format & quality.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Use named arguments when applicable.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Add Github link to Roberta/GPT2 space issue on masked input.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Move max_len_single_sentence / max_len_sentences_pair to PreTrainedTokenizerFast + tests.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Relax type checking to include tuple and list object.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing review comment: Document the truncate_and_pad manager behavior.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Raise an exception if return_offsets_mapping is not available with the current tokenizer.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure padding is set on the tokenizers before setting any padding strategy + unittest.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* On pytorch we need to stack tensor to get proper new axis.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Generalize tests to different framework removing hard written return_tensors="..."

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bump tokenizer dependency for num_special_tokens_to_add

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Overflowing tokens in batch_encode_plus are now stacked over the batch axis.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improved error message for padding strategy without pad token.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Bumping tokenizers dependency to 0.5.0 for release.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Optimizing convert_encoding around 4x improvement. 🚀

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* expose pad_to_max_length in encode_plus to avoid duplicating the parameters in kwargs

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Generate a proper overflow_to_sampling_mapping when return_overflowing_tokens is True.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix unittests for overflow_to_sampling_mapping not being returned as tensor.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Format & quality.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove perfect alignment constraint for Roberta (allowing 1% difference max)

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Triggering final CI

Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
2020-02-19 11:35:40 -05:00
ffb93ec0cc Create README.md 2020-02-19 10:51:16 -05:00
20fc18fbda Skip flaky test_tf_question_answering (#2845)
* Skip flaky test

* Style
2020-02-18 16:14:50 -05:00
2ae98336d1 fix vocab size in binarized_data (distil): int16 vs int32 2020-02-18 16:17:35 +00:00
0dbddba6d2 fix typo in hans example call 2020-02-17 20:19:57 +00:00
29ab4b7f40 Create README.md 2020-02-17 10:58:43 -05:00
c88ed74ccf [model_cards] 🇹🇷 Add new (cased) BERTurk model 2020-02-17 09:54:46 -05:00
5b2d4f2657 Merge pull request #2881 from patrickvonplaten/add_vim_swp_to_gitignore
update .gitignore to ignore .swp files created when using vim
2020-02-17 14:36:49 +01:00
fb4d8d0832 update .gitignore to ignore .swp files created when using vim 2020-02-17 14:26:32 +01:00
6083c1566e Update README.md
I trained the model for more epochs so I improved the results. This commit will update the results of the model and add a gif using it with **transformers/pipelines**
2020-02-16 10:09:34 -05:00
73028c5df0 [model_cards] EsperBERTo 2020-02-14 15:16:33 -05:00
81fb8d3251 Update model card: new performance chart (#2864)
* Update model performance for correct German conll03 dataset

* Adjust text

* Adjust line spacing
2020-02-14 13:39:23 -05:00
4e69104a1f [model_cards] Also use the thumbnail as meta
Co-Authored-By: Ilias Chalkidis <ihalk@di.uoa.gr>
2020-02-14 10:27:11 -05:00
73d79d42b4 [model_cards] nlptown/bert-base-multilingual-uncased-sentiment
cc @yvespeirsman

Co-Authored-By: Yves Peirsman <yvespeirsman@users.noreply.github.com>
2020-02-14 09:51:11 -05:00
47b735f994 Added model card for bert-base-multilingual-uncased-sentiment (#2859)
* Created model card for nlptown/bert-base-multilingual-sentiment

* Delete model card

* Created model card for bert-base-multilingual-uncased-sentiment as README
2020-02-14 09:31:15 -05:00
7d22fefd37 [pipeline] Alias NerPipeline as TokenClassificationPipeline 2020-02-14 09:18:10 -05:00
61a2b7dc9d Fix typo 2020-02-14 09:13:07 -05:00
6e261d3a22 Fix typos 2020-02-14 09:11:07 -05:00
4e597c8e4d Fix typo 2020-02-14 09:07:42 -05:00
925a13ced1 [model_cards] mv README.md 2020-02-13 23:07:29 -05:00
575a3b7aa1 Create distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-13 23:04:52 -05:00
4d36472b96 [run_ner] Don't crash if fine-tuning local model that doesn't end with digit 2020-02-14 03:25:29 +00:00
8514018300 Update with additional information
Added a "Pre-training details" section
2020-02-13 21:54:42 -05:00
1eec69a900 Create README.md 2020-02-13 19:27:22 -05:00
8744402f1e add model_card flaubert-base-uncased-squad (#2833)
* add model_card

* Add tag

cc @fmikaelian

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 17:19:13 -05:00
7f98edd7e3 Model card: Literary German BERT (#2843)
* feat: create model card

* chore: add description

* feat: stats plot

* Delete prosa-jahre.svg

* feat: years plot (again)

* chore: add more details

* fix: typos

* feat: kfold plot

* feat: kfold plot

* Rename model_cards/severinsimmler/literary-german-bert.md to model_cards/severinsimmler/literary-german-bert/README.md

* Support for linked images + add tags

cc @severinsimmler

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-13 15:43:44 -05:00
f1e8a51f08 Preserve spaces in GPT-2 tokenizers (#2778)
* Preserve spaces in GPT-2 tokenizers

Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa)
tokenizers, enabling correct BPE encoding. Automatically inserts a space
in front of first token in encode function when adding special tokens.

* Add tokenization preprocessing method

* Add framework argument to pipeline factory

Also fixes pipeline test issue. Each test input now treated as a
distinct sequence.
2020-02-13 13:29:43 -05:00
0ed630f139 Attempt to increase timeout for circleci slow tests (#2844) 2020-02-13 09:11:03 -05:00
ef74b0f07a get_activation('relu') provides a simple mapping from strings i… (#2807)
* activations.py contains a mapping from string to activation function
* resolves some `gelu` vs `gelu_new` ambiguity
2020-02-13 08:28:33 -05:00
f54a5bd37f Raise error when using an mlm flag for a clm model + correct TextDataset 2020-02-12 13:23:14 -05:00
569897ce2c Fix a few issues regarding the language modeling script 2020-02-12 13:23:14 -05:00
21da895013 [model_cards] Better image for social sharing 2020-02-11 20:30:08 -05:00
9a70910d47 [model_cards] Tweak @mrm8488's model card 2020-02-11 20:20:39 -05:00
9274734a0d [model_cards] mv to correct location + tweak tag 2020-02-11 20:13:57 -05:00
69f948461f Create bert-base-spanish-wwm-cased-finetuned-spa-squad2-es.md 2020-02-11 20:07:15 -05:00
e0b6247cf7 [model_cards] Change formatting slightly as we updated our markdown engine
cc @tholor @loretoparisi @simonefrancia
2020-02-11 18:25:21 -05:00
5f2dd71d1b Smaller diff 2020-02-11 17:20:09 -05:00
31158af57c formatting 2020-02-11 17:20:09 -05:00
5dd61fb9a9 Add more specific testing advice to Contributing.md 2020-02-11 17:20:09 -05:00
ee5de0ba44 BERT decoder: Fix causal mask dtype.
PyTorch < 1.3 requires multiplication operands to be of the same type.
This was violated when using default attention mask (i.e.,
attention_mask=None in arguments) given BERT in the decoder mode.

In particular, this was breaking Model2Model and made tutorial
from the quickstart failing.
2020-02-11 15:19:22 -05:00
bed38d3afe Fix typo in src/transformers/data/processors/squad.py 2020-02-11 11:22:24 -05:00
498d06e914 [model_cards] Add new German Europeana BERT models (#2805)
* [model_cards] New German Europeana BERT models from dbmdz

* [model_cards] Update German Europeana BERT models from dbmdz
2020-02-11 10:49:39 -05:00
3e3a9e2c01 Merge pull request #2793 from huggingface/tensorflow-210-circleci-fix
Fix circleci cuInit error on Tensorflow >= 2.1.0.
2020-02-11 10:48:42 +00:00
1f5db9a13c [model_cards] Rm extraneous tag 2020-02-10 17:45:13 -05:00
95bac8dabb [model_cards] Add language metadata to existing model cards
This will enable filtering on language (amongst other tags) on the website

cc @loretoparisi, @stefan-it, @HenrykBorzymowski, @marma
2020-02-10 17:42:42 -05:00
ba498eac38 Create README.md (#2785)
* Create README.md

* Update README.md

* Update README.md

* Update README.md

* [model_cards] Use code fences for consistency

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-02-10 17:27:59 -05:00
68ccc04ee6 Add model readme for deepset/roberta-base-squad2 (#2797)
* Add readme for deepset/roberta-base-squad2

* update model readme
2020-02-10 15:21:48 -05:00
539f601be7 intermediate_size > hidden_dim in distilbert config docstrings 2020-02-10 13:45:57 -05:00
cfb7d108bd FlauBERT lang embeddings only when n_langs > 1 2020-02-10 13:24:04 -05:00
b4691a438d [model_cards] BERT-of-Theseus: use the visual as thumbnail
cc @jetrunner

Co-Authored-By: Kevin Canwen Xu <canwenxu@outlook.com>
2020-02-10 11:27:08 -05:00
fc325e97cd [model_cards] Showcase model tag syntax 2020-02-10 11:27:08 -05:00
fd639e5be3 Correct quickstart example when using the past 2020-02-10 11:25:56 -05:00
63a5399bc4 [model_cards] Specify language meta + thumbnail
cc @tholor

see #2799
2020-02-10 11:20:05 -05:00
125a75a121 Correctly compute tokens when padding on the left 2020-02-10 10:47:42 -05:00
9c64d1da35 Add model readme for bert-base-german-cased (#2799)
* add readme for bert-base-german-cased

* update readme
2020-02-10 10:27:29 -05:00
bf99014c46 Create BERT-of-Theseus model card 2020-02-10 09:58:40 -05:00
92e974196f Merge pull request #2765 from huggingface/extract-cached-archives
Add option to `cached_path` to automatically extract archives
2020-02-10 14:05:16 +01:00
6aa7973aec Fix circleci cuInit error on Tensorflow >= 2.1.0.
Tensorflow 2.1.0 introduce a new dependency model where pip install tensorflow would install tf with GPU support.
Before it would just install with CPU support, thus CircleCI is looking for NVidia driver version at initialization of the
tensorflow related tests but fails as their is no NVidia Driver running.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-02-10 13:24:37 +01:00
520e7f2119 Correct docstring for xlnet 2020-02-07 16:42:35 -05:00
dd28830327 Update RoBERTa tips 2020-02-07 16:42:35 -05:00
db97930122 Update XLM-R tips 2020-02-07 16:42:35 -05:00
7046de2991 E231 2020-02-07 15:28:13 -05:00
0d3aa3c04c styling 2020-02-07 15:28:13 -05:00
d8b43600fd omission 2020-02-07 15:28:13 -05:00
ee5a6856ca distilbert-base-cased weights + Readmes + omissions 2020-02-07 15:28:13 -05:00
73368963b2 Fix importing unofficial TF models with extra optimizer weights 2020-02-07 10:25:31 -05:00
Ari
d7dabfeff5 Fix documentation in ProjectedAdaptiveLogSoftmax 2020-02-07 10:14:58 -05:00
42f08e596f [examples] rename run_lm_finetuning to run_language_modeling 2020-02-07 09:15:28 -05:00
4f7bdb0958 [examples] Fix broken markdown 2020-02-07 09:15:28 -05:00
c6c5c3fd4e style and quality 2020-02-07 08:58:06 +01:00
961c69776f @julien-c proposal for TF/PT compat in hf_buckets 2020-02-07 08:53:17 +01:00
d311f87bca cleanup 2020-02-07 00:05:28 +01:00
7d99e05f76 file_cache has options to extract archives 2020-02-07 00:03:12 +01:00
2c12464a20 Changed vocabulary save function. Variable name was inconsistent, causing an error to be thrown when passing a file name instead of a directory. 2020-02-06 16:40:07 -05:00
6fc3d34abd Fix multi-gpu evaluation in run_glue.py 2020-02-06 16:38:55 -05:00
7748cbbe7d Oopsie 2020-02-06 15:30:02 -05:00
432c12521e [docs] Add menu w/ links to other pages on hf.co 2020-02-06 15:30:02 -05:00
c069932f5d Add contributors snapshot
powered by https://github.com/sourcerer-io/hall-of-fame
2020-02-06 15:25:47 -05:00
33d3072e1c Arxiv README (#2747)
* Arxiv README

* ArXiv-NLP readme
2020-02-05 15:26:28 -05:00
eae8ee0389 [doc] model sharing: mention README.md + tweaks
cc @lysandrejik @thomwolf
2020-02-05 14:20:03 -05:00
6bb6a01765 Fix GPT2 config set to trainable
This prevents the model from being saved, and who knows
what else.
2020-02-05 13:55:41 -05:00
ada24def22 [run_lm_finetuning] Tweak fix for non-long tensor, close #2728
see 1ebfeb79469d544a2bd817aa32c77e0514485ff9 and #2728

Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2020-02-05 12:49:18 -05:00
2184f87003 RoBERTa TensorFlow Tests 2020-02-04 18:05:35 -05:00
e615269cb8 Correct slow test 2020-02-04 18:05:35 -05:00
5f96ebc0be Style 2020-02-04 18:05:35 -05:00
950c6a4f09 Flaubert PyTorch tests 2020-02-04 18:05:35 -05:00
d28b81dc29 RoBERTa Pytorch tests 2020-02-04 18:05:35 -05:00
d1ab1fab1b pass langs parameter to certain XLM models (#2734)
* pass langs parameter to certain XLM models

Adding an argument that specifies the language the SQuAD dataset is in so language-sensitive XLMs (e.g. `xlm-mlm-tlm-xnli15-1024`) don't default to language `0`.
Allows resolution of issue #1799 .

* fixing from `make style`

* fixing style (again)
2020-02-04 17:12:42 -05:00
9e5b549b4d fix default getattr 2020-02-04 16:38:52 -05:00
25848a6094 double quotes 2020-02-04 16:38:52 -05:00
cbcb83f21d minor cleanup of test_attention_outputs 2020-02-04 16:38:52 -05:00
3bf5417258 Revert erroneous fix 2020-02-04 16:31:07 -05:00
1ebfeb7946 Cast to long when masking tokens 2020-02-04 15:56:16 -05:00
9c67196b83 Update quickstart 2020-02-04 11:11:37 -05:00
90ab15cb7a Remove redundant hidden states 2020-02-04 10:59:32 -05:00
9a50828b5c Pipelines: fix crash when modelcard is None
cc @mfuntowicz does this seem correct?
2020-02-03 17:53:39 -05:00
6c1b23554f Sample instead of greedy decoding by default in generate 2020-02-03 17:23:53 -05:00
239dd23f64 [Follow up 213]
Masked indices should have -1 and not -100. Updating documentation + scripts that were forgotten
2020-02-03 16:08:05 -05:00
522c5b5533 Added README.md to Swedish BERT models from National Library of Sweden 2020-02-03 09:09:34 -05:00
9329e59700 Add READMEs to Tensorflow versions of CamemBERT and XLM-RoBERTa 2020-02-03 09:04:34 -05:00
2ba147ecff Fix typo in examples/utils_ner.py
"%s-%d".format() -> "{}-{}".format()
2020-02-01 11:10:57 -05:00
9773e5e0d9 CLI script to gather environment info (#2699)
* add "info" command to CLI

As a convenience, add the info directive to CLI. Running `python transformers-cli info` will return a string containing the transformers version, platform, python version, PT/TF version and GPU support

* Swap f-strings for .format

Still supporting 3.5 so can't use f-strings (sad face)

* Add reference in issue to CLI

* Add the expected fields to issue template

This way, people can still add the information manually if they want. (Though I fear they'll just ignore it.)

* Remove heading from output

* black-ify

* order of imports

Should ensure isort test passes

* use is_X_available over import..pass

* style

* fix copy-paste bug

* Rename command info -> env

Also adds the command to CONTRIBUTING.md in "Did you find a bug" section
2020-02-01 10:38:14 -05:00
ddb6f9476b [model_cards] dbmdz models
Co-Authored-By: Stefan Schweter <stefan-it@users.noreply.github.com>
2020-01-31 18:39:09 -05:00
6636826f04 [model_cards] Multilingual + Dutch SQuAD2.0
Co-Authored-By: HenrykBorzymowski <henrykborzymowski@users.noreply.github.com>
2020-01-31 18:39:09 -05:00
98dadc98e1 [model_cards] UmBERTo
Co-Authored-By: Loreto Parisi <loretoparisi@gmail.com>
Co-Authored-By: Simone Francia <francia.simone1@gmail.com>
2020-01-31 18:39:09 -05:00
d6fc34b459 [model_cards] add mine 2020-01-31 18:39:09 -05:00
d426b58b9e Patch: v2.4.1 2020-01-31 14:55:33 -05:00
1e82cd8457 Flaubert auto tokenizer + tests
cc @julien-c
2020-01-31 14:16:52 -05:00
d18d47be67 run_generation style 2020-01-31 12:05:48 -05:00
ff6f1492e8 FlauBERT load in AutoModel
The FlauBERT configuration file inherits from XLMConfig, and is recognized as such when loading from AutoModels as the XLMConfig is checked before the FlaubertConfig.

Changing the order solves this problem, but a test should be added.
2020-01-31 12:05:15 -05:00
7365f01d43 do_sample should be set to True in run_generation.py 2020-01-31 11:49:32 -05:00
3a21d6da6b Typo on markdown link in README.md 2020-01-31 10:58:49 -05:00
0aa40e9569 v2.4.0 documentation 2020-01-31 09:55:34 -05:00
8036ceb7c5 Update commands for pypi test 2020-01-31 09:48:15 -05:00
6664ea943d Release: v2.4.0 2020-01-31 09:40:32 -05:00
5a6b138b00 [Umberto] model shortcuts (#2661)
* [Umberto] model shortcuts

cc @loretoparisi @simonefrancia

see #2485

* Ensure that tokenizers will be correctly configured
2020-01-30 21:05:53 -05:00
7fe294bf07 Hotfix: same handling of non-existent files as for config 2020-01-30 20:05:04 -05:00
b85c59f997 config.architectures 2020-01-30 19:26:59 -05:00
f9bc3f5771 style tweak 2020-01-30 19:26:59 -05:00
0b13fb822a No need for a model_type here
cc @lysandrejik
2020-01-30 19:26:59 -05:00
71a382319f Correct documentation 2020-01-30 18:41:24 -05:00
01a14ebd8d Add FlauBERT to automodels 2020-01-30 18:40:22 -05:00
9fa836a73f fill_mask helper (#2576)
* fill_mask helper

* [poc] FillMaskPipeline

* Revert "[poc] FillMaskPipeline"

This reverts commit 67eeea55b0f97b46c2b828de0f4ee97d87338335.

* Revert "fill_mask helper"

This reverts commit cacc17b884e14bb6b07989110ffe884ad9e36eaa.

* README: clarify that Pipelines can also do text-classification

cf. question at the AI&ML meetup last week, @mfuntowicz

* Fix test: test feature-extraction pipeline

* Test tweaks

* Slight refactor of existing pipeline (in preparation of new FillMaskPipeline)

* Extraneous doc

* More robust way of doing this

@mfuntowicz as we don't rely on the model name anymore (see AutoConfig)

* Also add RobertaConfig as a quickfix for wrong token_type_ids

* cs

* [BIG] FillMaskPipeline
2020-01-30 18:15:42 -05:00
b43cb09aaa Add layerdrop 2020-01-30 12:05:01 -05:00
df27648bd9 Rename test_examples to test_doc_samples 2020-01-30 10:07:22 -05:00
93dccf527b Pretrained models 2020-01-30 10:04:18 -05:00
90787fed81 Style 2020-01-30 10:04:18 -05:00
73306d028b FlauBERT documentation 2020-01-30 10:04:18 -05:00
ce2f4227ab Fix failing FlauBERT test 2020-01-30 10:04:18 -05:00
f0a4fc6cd6 Add Flaubert 2020-01-30 10:04:18 -05:00
a5381495e6 Added classifier dropout rate in ALBERT 2020-01-30 09:52:34 -05:00
83446a88d9 Use _pad_token of pad_token_id
Requesting pad_token_id would cause an error message when it is None. Use private _pad_token instead.
2020-01-29 17:44:58 -05:00
9fde13a3ac Add check to verify existence of pad_token_id
In batch_encode_plus we have to ensure that the tokenizer has a pad_token_id so that, when padding, no None values are added as padding. That would happen with gpt2, openai, transfoxl.

closes https://github.com/huggingface/transformers/issues/2640
2020-01-29 17:44:58 -05:00
e63a81dd25 Style 2020-01-29 16:29:20 -05:00
217349016a Copy object instead of passing the reference 2020-01-29 16:15:39 -05:00
adb8c93134 Remove lines causing a KeyError 2020-01-29 14:01:16 -05:00
c69b082601 Update documentation 2020-01-29 12:06:13 -05:00
ca1d66734d Apply quality and style requirements once again 2020-01-29 12:06:13 -05:00
5e3c72842d bugfix on model name 2020-01-29 12:06:13 -05:00
0731fa1587 Apply quality and style requirements 2020-01-29 12:06:13 -05:00
a3998e76ae Add TF2 CamemBERT model 2020-01-29 12:06:13 -05:00
b5625f131d Style 2020-01-29 11:47:49 -05:00
44a5b4bbe7 Update documentation 2020-01-29 11:47:49 -05:00
7fc628d98e Apply style 2020-01-29 11:47:49 -05:00
64ca855617 Add TF2 XLM-RoBERTa model 2020-01-29 11:47:49 -05:00
9d87eafd11 Streamlining
- mostly stylistic streamlining
- removed 'additional context' sections. They seem to be rarely used and might cause confusion. If more details are needed, users can add them to the 'details' section
2020-01-28 10:41:10 -05:00
a3b3638f6f phrasing 2020-01-28 10:41:10 -05:00
c96ca70f25 Update ---new-benchmark.md 2020-01-28 10:41:10 -05:00
7b5eda32bb Update --new-model-addition.md
Motivate users to @-tag authors of models to increase visibility and expand the community
2020-01-28 10:41:10 -05:00
c63d91dd1c Update bug-report.md
- change references to pytorch-transformers to transformers
- link to code formatting guidelines
2020-01-28 10:41:10 -05:00
b2907cd06e Update feature-request.md
- add 'your contribution' section
- add code formatting link to 'additional context'
2020-01-28 10:41:10 -05:00
2fec88ee02 Update question-help.md
Prefer that general questions are asked on Stack Overflow
2020-01-28 10:41:10 -05:00
7e03d2bd7c update migration guide
Streamlines usages of pytorch-transformers and pytorch-pretrained-bert. Add link to the README for the migration guide.
2020-01-28 10:41:10 -05:00
335dd5e68a Default save steps 50 to 500 in all scripts 2020-01-28 09:42:11 -05:00
ea2600bd5f Absolute definitive HeisenDistilBug solve
cc @julien-c @thomwolf
2020-01-27 21:58:36 -05:00
5c3d441ee1 Fix formatting 2020-01-27 21:00:34 -05:00
f5a236c3ca Add Dutch pre-trained BERT model 2020-01-27 21:00:34 -05:00
6b4c3ee234 [run_lm_finetuning] GPT2 tokenizer doesn't have a pad_token
ping @lysandrejik
2020-01-27 20:14:02 -05:00
79815bf666 [serving] Fix typo 2020-01-27 19:58:25 -05:00
5004d5af42 [serving] Update dependencies 2020-01-27 19:58:00 -05:00
9ca21c838b Style 2020-01-27 14:49:12 -05:00
e0849a66ac adding in the doc 2020-01-27 14:27:07 -05:00
6b081f04e6 style and quality 2020-01-27 14:27:07 -05:00
0e31e06a75 Add AutoModelForPreTraining 2020-01-27 14:27:07 -05:00
ea56d305be make style 2020-01-27 12:13:32 -05:00
d440e21f5b add mapping of roberta for QA 2020-01-27 12:12:46 -05:00
875c4ae48f Definitive HeisenDistilBug fix
cc @julien-c @@thomwolf
2020-01-27 12:09:58 -05:00
f09f42d4d3 Input Embeddings should be assigned
cc @julien-c
2020-01-27 11:46:00 -05:00
bac51fba3a Fix token_type_ids for XLM-R 2020-01-27 11:08:31 -05:00
babd41e7fa Code quality 2020-01-24 17:06:55 -05:00
974d083c7b Accurate model for configuration 2020-01-24 16:46:03 -05:00
983fef469c AutoModels doc 2020-01-24 16:37:30 -05:00
009fcb0ec1 Configuration utils 2020-01-24 16:37:30 -05:00
11b13e94a3 Add type to help my IDE out 2020-01-24 14:00:57 -05:00
1ce3fb5cc7 update correct eval metrics (distilbert & co) 2020-01-24 11:45:22 -05:00
62f5804608 Update the doc string for T5WithLMHeadModel
T5WithLMHeadModel's doc string claims that indices of -1 are
ignored while computing the cross-entropy loss in the forward
pass; however, indices of -1 throw an error while indices of -100
are ignored. This commit updates the doc string to be consistent
with the class's behavior.
2020-01-24 10:28:20 -05:00
908230d261 Pickle CamemBERT tokenizer 2020-01-24 10:08:59 -05:00
24d5ad1dcc Run the examples in slow 2020-01-23 09:38:45 -05:00
9ddf60b694 Tips + whitespaces 2020-01-23 09:38:45 -05:00
0e9899f451 Fixes 2020-01-23 09:38:45 -05:00
48ac24020d TF CTRL 2020-01-23 09:38:45 -05:00
7511f3dd89 PyTorch CTRL + Style 2020-01-23 09:38:45 -05:00
980211a63a XLM-RoBERTa 2020-01-23 09:38:45 -05:00
6bc966793a TF DistilBERT 2020-01-23 09:38:45 -05:00
db1a7f27a1 PyTorch DistilBERT 2020-01-23 09:38:45 -05:00
b28020f590 TF RoBERTa 2020-01-23 09:38:45 -05:00
3e1bc27e1b Pytorch RoBERTa 2020-01-23 09:38:45 -05:00
f44ff574d3 Camembert 2020-01-23 09:38:45 -05:00
264eb23912 TF XLM 2020-01-23 09:38:45 -05:00
ccebcae75f PyTorch XLM 2020-01-23 09:38:45 -05:00
92b3cb786d TF XLNet 2020-01-23 09:38:45 -05:00
cd656fb21a PyTorch XLNet 2020-01-23 09:38:45 -05:00
83fa8d9fb5 TF Transformer-XL 2020-01-23 09:38:45 -05:00
98edad418e PyTorch Transformer-XL 2020-01-23 09:38:45 -05:00
96d21ad06b TF OpenAI GPT 2020-01-23 09:38:45 -05:00
850795c487 Pytorch GPT 2020-01-23 09:38:45 -05:00
1487b840d3 TF GPT2 2020-01-23 09:38:45 -05:00
bd0d3fd76e GPT-2 PyTorch models + better tips for BERT 2020-01-23 09:38:45 -05:00
dbeb7fb4e6 BERT TensorFlow 2020-01-23 09:38:45 -05:00
cd77c750c5 BERT PyTorch models 2020-01-23 09:38:45 -05:00
3922a2497e TF ALBERT + TF Utilities + Fix warnings 2020-01-23 09:38:45 -05:00
00df3d4de0 ALBERT Modeling + required changes to utilities 2020-01-23 09:38:45 -05:00
f81b6c95f2 Flake8 violation 2020-01-23 09:38:45 -05:00
632675ea88 Can test examples spread over multiple blocks 2020-01-23 09:38:45 -05:00
eaa6b9afc6 Require Torch when testing examples 2020-01-23 09:38:45 -05:00
9bab9b83d2 Glossary 2020-01-23 09:38:45 -05:00
64abd3e0aa Multi-line examples can be tested + ALBERT patch for CircleCI
All tests should now work fine.
2020-01-23 09:38:45 -05:00
837577256b Automatic testing of examples
The CircleCI test should fail.
2020-01-23 09:38:45 -05:00
90b7df444f Upload CLI: on win32, use slashes, not os.sep 2020-01-22 22:41:21 -05:00
119dc50e2a Doc tweak on model sharing 2020-01-22 22:40:38 -05:00
34a3c25a30 Fix for XLMRobertaConfig inherits from RobertaConfig
hat/tip @stefan-it
2020-01-22 17:50:24 -05:00
1a8e87be4e Line-by-line text dataset (including padding) 2020-01-21 16:57:38 -05:00
b94cf7faac change order 2020-01-21 16:57:38 -05:00
2eaa8b6e56 Easier to not support this, as it could be confusing
cc @lysandrejik
2020-01-21 16:57:38 -05:00
801aaa5508 make style 2020-01-21 16:57:38 -05:00
56d4ba8ddb [run_lm_finetuning] Train from scratch 2020-01-21 16:57:38 -05:00
c7f79815e7 Cleanup unused variables 2020-01-21 11:40:24 -05:00
15579e2d55 [SQuAD v2] Code quality 2020-01-21 11:36:46 -05:00
088fa7b759 Correct segment ID for XLNet single sequence 2020-01-21 11:33:45 -05:00
073219b43f Manage impossible examples SQuAD v2 2020-01-21 11:24:43 -05:00
983c484fa2 add __getstate__ and __setstate__ to XLMRobertaTokenizer 2020-01-21 10:18:24 -05:00
cefd51c50c Fix glue processor failing on tf datasets 2020-01-20 11:46:43 -05:00
ca6ce3040d Fix style 2020-01-20 10:56:23 -05:00
908cd5ea27 Make forward asynchrone to avoid long computation timing out.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
6e6c8c52ed Fix bad handling of env variable USE_TF / USE_TORCH leading to invalid framework being used.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-20 10:56:23 -05:00
23c6998bf4 Add lower bound to tqdm for tqdm.auto
- It appears that `tqdm` only introduced `tqdm.auto` in 4.27.
- See https://github.com/tqdm/tqdm/releases/tag/v4.27.0.
- Without the lower bound I received the following stack trace in an environment where I already had tqdm installed:
```
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module>
    from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
  File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module>
    from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
2020-01-17 18:29:11 -05:00
65a89a8976 Fix BasicTokenizer to respect never_split parameters (#2557)
* add failing test

* fix call to _run_split_on_punc

* format with black
2020-01-17 14:57:56 -05:00
6d5049a24d Fix typo in examples/run_squad.py
Rul -> Run
2020-01-17 11:22:51 -05:00
23a2cea8cb Tokenizer.from_pretrained: fetch all possible files remotely 2020-01-16 16:47:19 -05:00
99f9243de5 same here, try to not serialize too much if unneeded 2020-01-16 16:47:19 -05:00
9d8fd2d40e tokenizer.save_pretrained: only save file if non-empty 2020-01-16 16:47:19 -05:00
6e2c28a14a Run SQuAD warning when the doc stride may be too high 2020-01-16 13:59:26 -05:00
b8f43cb273 Merge pull request #2239 from ns-moosavi/HANS-evaluation-example
HANS evaluation
2020-01-16 13:28:25 +01:00
258ed2eaa8 adding details in readme 2020-01-16 13:21:30 +01:00
50ee59578d update formating - make flake8 happy 2020-01-16 13:21:30 +01:00
1c9333584a formating 2020-01-16 13:21:30 +01:00
e25b6fe354 updating readme 2020-01-16 13:21:30 +01:00
27c7b99015 adding details in readme - moving file 2020-01-16 13:21:30 +01:00
99d4515572 HANS evaluation 2020-01-16 13:21:30 +01:00
dc17f2a111 Merge pull request #2538 from huggingface/py3_super
💄 super
2020-01-16 13:17:15 +01:00
880854846b Merge pull request #2540 from huggingface/torch14_fix
[PyTorch 1.4] Fix failing torchscript test for xlnet
2020-01-16 13:16:59 +01:00
d9fa1bad72 Fix failing torchscript test for xlnet
model.parameters() order is apparently not stable (only for xlnet, for some reason)
2020-01-15 20:22:21 -05:00
a98b2ca8c0 Style + fixup BertJapaneseTokenizer 2020-01-15 19:05:51 -05:00
83a41d39b3 💄 super 2020-01-15 18:33:50 -05:00
cd51893d37 Merge branch 'Rexhaif-patch-1' 2020-01-15 18:25:15 -05:00
248aeaa842 Merge branch 'patch-1' of https://github.com/Rexhaif/transformers into Rexhaif-patch-1 2020-01-15 18:22:01 -05:00
c76c3cebed Add check for token_type_ids before tensorizing
Fix an issue where `prepare_for_model()` gives a `KeyError` when
`return_token_type_ids` is set to `False` and `return_tensors` is
enabled.
2020-01-15 12:31:43 -05:00
eb59e9f705 Graduate sst-2 to a canonical one 2020-01-15 16:28:50 +00:00
e184ad13cf Close #2392 2020-01-15 15:43:44 +00:00
dfe012ad9d Fix misleading RoBERTa token type ids 2020-01-14 17:47:28 -05:00
c024ab98df Improve padding side documentation 2020-01-14 17:44:23 -05:00
9aeb0b9b8a Improve padding side documentation 2020-01-14 17:43:00 -05:00
715fa638a7 Merge branch 'master' into from_scratch_training 2020-01-14 18:58:21 +00:00
100e3b6f21 Bias should be resized with the weights
Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder.

Added a test.
2020-01-14 13:43:45 -05:00
6c32d8bb95 Size > Dimensionality + Remove final TODOs 2020-01-14 14:09:09 +01:00
760164d63b RoBERTa example 2020-01-14 14:09:09 +01:00
387217bd3e Added example usage 2020-01-14 14:09:09 +01:00
7d1bb7f256 Add missing XLNet and XLM models 2020-01-14 14:09:09 +01:00
a1cb100460 Wrap up configurations 2020-01-14 14:09:09 +01:00
c11b6fd393 Update links in all configurations 2020-01-14 14:09:09 +01:00
632682726f Updated Configurations 2020-01-14 14:09:09 +01:00
2b566c182e Merge pull request #2384 from dimagalat/master
Releasing file lock
2020-01-14 13:19:01 +01:00
764f836d52 Update test_tokenization_auto.py 2020-01-13 22:50:34 -05:00
d5831acb07 Update test_tokenization_auto.py 2020-01-13 22:47:33 -05:00
ed6cd597cc Update test_tokenization_auto.py 2020-01-13 22:46:35 -05:00
5cb463a714 Update test_tokenization_auto.py 2020-01-13 22:38:29 -05:00
afc24ea5d4 In a parallel setup this could fail 2020-01-13 23:44:08 +00:00
894812c652 Fixup mapping 2020-01-13 23:34:19 +00:00
b20f11d4ca 🔫 Python35 2020-01-13 23:20:44 +00:00
0304628590 Map configs to models and tokenizers 2020-01-13 23:11:44 +00:00
1fc855e456 [tests] Safety checks on CONFIG_MAPPING 2020-01-13 21:52:55 +00:00
3c86b6f3c5 Py35 doesn't like inline variable types 2020-01-13 20:44:33 +00:00
b803b067bf Config to Model mapping 2020-01-13 20:05:20 +00:00
896a0eb1fd Merge pull request #2459 from Perseus14/patch-4
Update pipelines.py
2020-01-13 16:02:54 +01:00
0d6c17fc1b black formatting 2020-01-13 11:18:27 +01:00
a3085020ed Added repetition penalty to PPLM example (#2436)
* Added repetition penalty

* Default PPLM repetition_penalty to neutral

* Minor modifications to comply with reviewer's suggestions. (j -> token_idx)

* Formatted code with `make style`
2020-01-10 23:00:07 -05:00
cf8a70bf68 More AutoConfig tests 2020-01-11 03:43:57 +00:00
6bb3edc300 Serialize model_type if exists 2020-01-11 03:18:56 +00:00
c6f682c1eb flake 2020-01-11 03:18:31 +00:00
4d1c98c012 AutoConfig + other Auto classes honor model_type 2020-01-11 02:46:17 +00:00
2f32dfd33b Convention: name mixins mixins 2020-01-11 01:24:29 +00:00
e83d9f1c1d cleaning - change ' to " (black requirements) 2020-01-10 19:34:25 -05:00
ebba9e929d minor spring cleaning - missing configs + processing 2020-01-10 19:14:58 -05:00
055e80cfad rm old ConfigTester 2020-01-10 21:36:18 +00:00
b1e1a9f9b2 Merge pull request #2495 from mschrimpf/patch-1
T5: move rp_bucket to relative_attention_bias' device
2020-01-10 22:18:54 +01:00
fd8423321f keep list sorted 2020-01-10 20:36:46 +00:00
0cd81fb99f [isort] declare more third-parties in case no tf install 2020-01-10 20:35:45 +00:00
90d3b787f6 move rp_bucket to relative_attention_bias' device
otherwise, `rp_bucket` will always be on cpu and fail if `self.relative_attention_bias` is on cuda
2020-01-10 15:09:10 -05:00
84c0aa1868 num_parameters helper 2020-01-10 17:40:02 +00:00
331065e62d missing import 2020-01-10 11:42:53 +01:00
414e9e7122 indents test 2020-01-10 11:42:53 +01:00
3cdb38a7c0 indents 2020-01-10 11:42:53 +01:00
ebd45980a0 Align with run_squad + fix some errors 2020-01-10 11:42:53 +01:00
45634f87f8 fix Sampler in distributed training - evaluation 2020-01-10 11:42:53 +01:00
af1ee9e648 Move torch.nn.utils.clip_grad_norm_ 2020-01-10 11:42:53 +01:00
164c794eb3 New SQuAD API for distillation script 2020-01-10 11:42:53 +01:00
801f2ac8c7 Add PRETRAINED_INIT_CONFIGURATION to DistilBERT tokenizer 2020-01-10 11:42:21 +01:00
bfec203d4e modified: src/transformers/tokenization_utils.py 2020-01-09 12:54:28 +01:00
f599623a99 PreTrainedTokenizerFast: hotfix _convert_encoding
cc @n1t0
2020-01-08 15:46:37 -05:00
f26a353057 Update pipelines.py
Modified QA pipeline to consider all features for each example before generating topk answers. 
Current pipeline only takes one SquadExample, one SquadFeature, one start logit list, one end logit list to retrieve the answer, this is not correct as one SquadExample can produce multiple SquadFeatures.
2020-01-08 21:12:34 +05:30
16ce15ed4b DistilBERT token type ids removed from inputs in run_squad 2020-01-08 13:18:30 +01:00
f24232cd1b Fix error with global step in run_squad.py 2020-01-08 11:39:00 +01:00
1b59b57b57 ignore_index equal -100 in T5 model 2020-01-08 09:52:10 +01:00
569da80ced Make doc regarding masked indices more clear.
Signed-off-by: Romain Keramitas <r.keramitas@gmail.com>
2020-01-07 17:37:27 +01:00
43114b89ba spelling correction (#2434) 2020-01-07 17:25:25 +01:00
d6a677b14b Fix typograpical errors (#2438) 2020-01-07 17:21:23 +01:00
27c1b656cc Fix error with global step in run_lm_finetuning.py 2020-01-07 16:16:12 +01:00
24df44d9c7 Black version python 3.5 2020-01-07 15:53:42 +01:00
73be60c47b Quotes 2020-01-07 15:34:23 +01:00
6806f8204e fix #2410 2020-01-07 15:20:45 +01:00
176d3b3079 Add support for Albert and XLMRoberta for the Glue example (#2403)
* Add support for Albert and XLMRoberta for the Glue example
2020-01-07 14:55:55 +01:00
9261c7f771 Remove f-string device creation on PyTorch GPU pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-07 11:46:44 +01:00
91d33c798b Fix issue on pipelines where pytorch's tensors are not copied on the user-specified GPU device.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-07 11:12:31 +01:00
2926852f14 fixed formatting 2020-01-07 11:56:03 +11:00
e2810edc8f removing redundant .flush 2020-01-07 11:47:25 +11:00
c301faa92b Distributed or parallel setup 2020-01-06 18:41:08 -05:00
81d6841b4b GPU text generation: mMoved the encoded_prompt to correct device 2020-01-06 15:11:12 +01:00
dd4df80f0b Moved the encoded_prompts to correct device 2020-01-06 15:11:12 +01:00
1efc208ff3 Complete DataProcessor class 2020-01-06 15:02:25 +01:00
c45d0cf60f Improve logging message in the single sentence classification processor 2020-01-06 14:54:36 +01:00
bf89be77b9 Improve logging message in the single sentence classification processor 2020-01-06 14:54:36 +01:00
bf8d4bc674 Improve logging message in glue feature conversion 2020-01-06 14:54:36 +01:00
74755c89b9 Example snippet for BertForQuestionAnswering 2020-01-06 14:41:53 +01:00
0ffc8eaf53 Enforce target version for black.
This should stabilize formatting.
2020-01-05 12:52:14 -05:00
f01b3e6680 fix #2399 an ImportError in official example (#2400)
* fix #2399 an ImportError in official example

* style

Co-authored-by: Julien Chaumond <chaumond@gmail.com>
2020-01-05 12:50:20 -05:00
78528742f1 Fix syntax + link to community page 2020-01-05 12:43:39 -05:00
12e0aa4368 Proposition to include community models in readme 2020-01-05 12:37:11 -05:00
80faf22b4a Updating documentation for converting tensorflow model to reflect the new cli convert format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
2020-01-04 13:41:18 +01:00
d0e594f9db Releasing file lock 2020-01-02 09:45:48 +11:00
629b22adcf [run_lm_finetuning] mask_tokens: document types 2020-01-01 12:55:10 -05:00
594ca6dead [debug] Debug Heisenbug, the old school way. 2019-12-29 10:07:21 -05:00
0df4e62da0 [http] Tweak http user-agent (#2353) 2019-12-29 10:06:50 -05:00
f75bf05ce6 Merge pull request #2352 from huggingface/cli_tweaks
Cli tweaks
2019-12-28 15:40:00 +01:00
0d467fd6de Typo 2019-12-27 23:06:48 -05:00
d8293e84f3 [cli] upload: max number of files at the same time 2019-12-27 23:02:53 -05:00
4d6c93e923 Kill __main__ 2019-12-27 22:55:22 -05:00
9b2badf3c9 [cli] Update doc 2019-12-27 22:54:29 -05:00
f78ebc22ad [cli] Add ability to delete remote object 2019-12-27 22:53:49 -05:00
bfe870be65 Hotfix tokenizers version for sdist installs 2019-12-27 11:05:52 -05:00
74ea432847 Merge pull request #2286 from adelevie/patch-2
Typo in tokenization_utils.py
2019-12-27 10:50:47 +01:00
492bea9aa0 Merge pull request #2292 from patrickvonplaten/add_cached_past_for_language_generation
Add cached past for language generation
2019-12-27 10:33:27 +01:00
e213900fa2 Merge pull request #2290 from patrickvonplaten/fix_typo_in_doc_for_language_generation
duplicated line for repeating_words_penalty_for_language_generation
2019-12-27 10:29:06 +01:00
9f5f646442 Merge pull request #2211 from huggingface/fast-tokenizers
Fast tokenizers
2019-12-27 10:24:29 +01:00
9024b19994 Auto-format (fixes previous commit). 2019-12-27 10:13:52 +01:00
3233b58ad4 Quote square brackets in shell commands.
This ensures compatibility with zsh.

Fix #2316.
2019-12-27 08:50:25 +01:00
e6ec24fa88 Better added_tokens handling 2019-12-26 16:49:48 -05:00
599db139f9 Code style update 2019-12-26 15:13:30 -05:00
835b76a46f Handle unk_token
As we discussed, this is handled here directly 
cc @thomwolf
2019-12-26 14:42:55 -05:00
7ead04ce14 FastPreTrainedTokenizer => PreTrainedTokenizerFast 2019-12-26 14:39:39 -05:00
1f82a5d910 Update for changes in tokenizers API 2019-12-26 14:37:55 -05:00
8c67b529f6 Merge pull request #2324 from kashif/patch-1
Typo in serving.py
2019-12-26 12:38:06 +01:00
7211541ade Typo in serving.py 2019-12-26 12:21:40 +01:00
0f6017bee3 improve comments for examples 2019-12-26 00:35:11 +01:00
87c8fca9bc add example for ctrl text generation in docs 2019-12-26 00:29:19 +01:00
88def24c45 merge conflicts - renamed to previous_token singular 2019-12-26 00:27:16 +01:00
822f725a07 duplicated line for repeating_words_penalty_for_language_generation 2019-12-26 00:25:29 +01:00
fc84bd5254 adapt style to predefined style layout 2019-12-25 23:32:44 +01:00
deff792bb6 add prepare inputs for transfo_xl and xlnet 2019-12-25 23:17:24 +01:00
9398058e19 add easy tensor shape match test 2019-12-25 23:17:24 +01:00
90cda45e9e add past re-ordering for beam search 2019-12-25 23:17:24 +01:00
6bca56fdb0 check for self.config.mem_len instead of self.mem_len in _do_output_past 2019-12-25 23:17:24 +01:00
365ccd0af2 make if statements cleaner for prepare_inputs_for_generation 2019-12-25 23:17:24 +01:00
d039c679d2 better naming for if statement 2019-12-25 23:17:24 +01:00
7e0c5c731a changed do_output_past function to check for self.config.output_past instead of self.output_past 2019-12-25 23:17:24 +01:00
eeaa402cd4 rename comments 2019-12-25 23:17:24 +01:00
7bb4271291 remove ipdb debugging statements 2019-12-25 23:17:24 +01:00
267587c258 add and improve comments 2019-12-25 23:17:24 +01:00
d891fd0ae0 add past hidden key states for more efficient language generation & add prepare_inputs for gpt2 and ctrl model 2019-12-25 23:17:24 +01:00
aeef4823ab Merge pull request #2303 from patrickvonplaten/fix_error_with_repetition_penalty
fix repetition penalty error in modeling_utils.py
2019-12-25 22:39:20 +01:00
0412f3d929 Merge pull request #2291 from aaugustin/fix-flake8-F841
Fix F841 flake8 warning
2019-12-25 22:37:42 +01:00
8742c95461 Merge pull request #2289 from patrickvonplaten/fix_effective_batch_size_lang_gen_xlm
fix bug in prepare inputs for language generation for xlm for effective batch_size > 1
2019-12-25 22:30:46 +01:00
1240be3ed9 Merge pull request #2312 from vitaliyradchenko/fix_special_and_add_tokens_loading
Correct tokenization for special and added tokens
2019-12-25 20:52:30 +01:00
b262577d17 add special tokens to unique_added_tokens_encoder 2019-12-25 18:31:35 +02:00
83a2347952 fixed lack of added and special tokens 2019-12-25 18:03:19 +02:00
cea04a2443 Merge pull request #2310 from ShnitzelKiller/scatter-unfix
revert erroneous fix #2276
2019-12-25 12:43:22 +01:00
e1844d9a45 use positional arguments due to inconsistent API 2019-12-25 01:34:02 -08:00
9fb7addd4d revert erroneous fix 2019-12-24 22:26:09 -08:00
734d29b03d tokenizers is now a real dependency 2019-12-24 13:32:41 -05:00
2818e50569 Add tests for fast tokenizers 2019-12-24 13:29:01 -05:00
31c56f2e0b Fix style 2019-12-24 12:43:27 -05:00
951ae99bea BertTokenizerFast 2019-12-24 12:24:24 -05:00
041eac2d6d GPT2TokenizerFast 2019-12-24 12:24:14 -05:00
3471ff0d35 FastPreTrainedTokenizer 2019-12-24 12:23:30 -05:00
18e5bdbec5 fix repetition penalty error in modeling_utils.py 2019-12-24 17:18:05 +01:00
f18ac4c28e fix sequence length for prepare_inputs for xlnet 2019-12-24 16:43:24 +01:00
359dc43837 fix effective batch_size error in prepare_inputs also for xlnet 2019-12-24 16:33:20 +01:00
d98a384cb0 fix bug in prepare inputs for language generation for xlm for effective batch_size > 1 2019-12-24 16:29:54 +01:00
3e0cf49514 adding back last dropout in TF 2.0 T5 2019-12-24 11:30:56 +01:00
35d32308de adding back final dropout in T5 2019-12-24 11:29:49 +01:00
81db12c3ba Merge pull request #2271 from aaugustin/improve-setup-and-requirements
Improve setup and requirements
2019-12-24 11:21:20 +01:00
10724a8123 Run the slow tests every Monday morning. 2019-12-24 09:09:43 +01:00
a8d34e534e Remove [--editable] in install instructions.
Use -e only in docs targeted at contributors.

If a user copy-pastes  command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.
2019-12-24 08:46:08 +01:00
e74c73a85d Enable F841 warning in flake8. 2019-12-23 22:38:23 +01:00
e6c0019c80 Remove unused variables in tests. 2019-12-23 22:38:18 +01:00
495580dad1 Remove unused variables in templates. 2019-12-23 22:38:18 +01:00
71f94a8a1c Remove unused variables in src. 2019-12-23 22:38:09 +01:00
81422c4e6d Remove unused variables in examples. 2019-12-23 22:29:02 +01:00
072750f4dc Merge pull request #2288 from aaugustin/better-handle-optional-imports
Improve handling of optional imports
2019-12-23 22:28:47 +01:00
4621ad6f9d Use the same pattern as everywhere else.
This is really just for consistency.
2019-12-23 21:30:04 +01:00
a31d4a2971 Reraise ImportError when sentencepiece isn't installed.
Else, the next line fails with a confusion exception because the spm
variable isn't defined.
2019-12-23 21:27:42 +01:00
c8b0c1e551 Improve exception type.
ImportError isn't really appropriate when there's no import involved.
2019-12-23 21:27:38 +01:00
4c09a96096 Simplify re-raising exceptions.
Most module use the simpler `raise` version. Normalize those that don't.
2019-12-23 21:20:54 +01:00
5565dcdd35 Remove warning when scikit-learn isn't available.
Most users don't need it.
2019-12-23 21:16:26 +01:00
8a6881822a Run some tests on Python 3.7.
This will improve version coverage.
2019-12-23 21:06:23 +01:00
7a865821d9 Remove stray egg-info directory automatically.
If a user or contributor ran `pip install -e .` on transformers < 3.0,
pip created a transformers.egg-info directory next to the transformers
directory at the root of the repository.

In transformers 3.0, the source is in a `src` subdirectory.
`pip install -e .` creates a transformers.egg-info directory there.
However, pip will still pick transformers.egg-info from the previous
location. This is a bug: https://github.com/pypa/pip/issues/5466

Users and contributors are likely to hit this problem because the
documentation for transformers 3.0 relies heavily on extra_requires
which didn't exist in earlier versions, so aren't defined in a stale
transformers.egg-info directory.

If such a directory exists, remove it. It's autogenerated, gitignored
and not supposed to contain anything of value.
2019-12-23 21:06:23 +01:00
70373a5f7c Update contribution instructions.
Also provide shortcuts in a Makefile.
2019-12-23 21:05:30 +01:00
c3783399db Remove redundant requirements with transformers. 2019-12-23 19:17:27 +01:00
d79e9c9a9a Remove docs/requirements.txt.
It's superseded by the "docs" extras.
2019-12-23 19:17:07 +01:00
d73eb552e8 Remove requirements.txt.
It's redundant with setup.py and, also, incomplete (e.g. numpy).
2019-12-23 19:15:08 +01:00
9fcc532df6 Remove requirements-dev.txt.
It was generated once, likely in a non-reproducible way (pip freeze
in a contributor's local environment), and never updated.
2019-12-23 19:14:36 +01:00
76a1417f2a Include all optional dependencies in extras.
Take advantage of this to simplify the Circle CI configuration.

Don't bother with tensorboardX: it's a fallback for PyTorch < 1.1.0.
2019-12-23 19:14:31 +01:00
9fc8dcb2a0 Standardize import.
Every other file uses this pattern.
2019-12-23 18:45:42 +01:00
f2522869ea Review and update setup.py. 2019-12-23 18:45:42 +01:00
7cef764ec0 Typo in tokenization_utils.py
avoir -> avoid
2019-12-23 12:14:50 -05:00
23dad8447c Install deps from setup.py for building docs.
requirements.txt isn't up to date.
2019-12-23 17:06:32 +01:00
d8e33dbd67 Fix path to source code in docs config.
This should fix API docs, which went AWOL with yesterday's changes.
2019-12-23 16:49:35 +01:00
59b123bc50 fix tqdm logging level 2019-12-23 16:47:24 +01:00
ba2378ced5 Merge pull request #2264 from upura/fix-doclink
Fix doc link in README
2019-12-23 12:31:00 +01:00
e4e2a666c9 Merge pull request #2276 from ShnitzelKiller/scatterfix
fix error due to wrong argument name to Tensor.scatter()
2019-12-23 12:19:48 +01:00
398bb03f98 fix out-of-place call to scatter, whose named argument name is source, not src 2019-12-22 23:30:52 -08:00
ce50305e5b Merge pull request #2270 from aaugustin/remove-python-2
Remove support for Python 2
2019-12-22 23:04:37 +01:00
1a948d7020 Switch from comments to annotations for types. 2019-12-22 18:56:01 +01:00
1c62e87b34 Use built-in open().
On Python 3, `open is io.open`.
2019-12-22 18:38:56 +01:00
d6eaf4e6d2 Update comments mentioning Python 2. 2019-12-22 18:38:56 +01:00
45841eaf7b Remove references to Python 2 in documentation. 2019-12-22 18:38:56 +01:00
0dddc1494d Remove py3 marker. 2019-12-22 18:38:56 +01:00
75a23d24af Remove import fallbacks. 2019-12-22 18:38:56 +01:00
798b3b3899 Remove sys.version_info[0] == 2 or 3. 2019-12-22 18:38:42 +01:00
8af25b1664 Remove six. 2019-12-22 17:56:09 +01:00
6b2200fc88 Remove u-prefixes. 2019-12-22 17:47:54 +01:00
c824d15aa1 Remove __future__ imports. 2019-12-22 17:47:54 +01:00
b6ea0f43ae Remove duplicate -v flag. 2019-12-22 17:47:27 +01:00
5daca95ddd Merge pull request #2268 from aaugustin/improve-repository-structure
Improve repository structure
2019-12-22 16:41:53 +01:00
54abc67aec Merge pull request #2255 from aaugustin/implement-best-practices
Implement some Python best practices
2019-12-22 16:31:11 +01:00
00204f2b4c Replace CommonTestCases for tokenizers with a mixin.
This is the same change as for (TF)CommonTestCases for modeling.
2019-12-22 15:35:25 +01:00
a3c5883f2c Rename file for consistency. 2019-12-22 15:35:25 +01:00
daf8bebcdd Remove unused GPTModelTester.
It isn't imported anywhere.
2019-12-22 15:35:25 +01:00
345c23a60f Replace (TF)CommonTestCases for modeling with a mixin.
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
2019-12-22 15:35:18 +01:00
7e98e211f0 Remove unittest.main() in test modules.
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
2019-12-22 14:42:03 +01:00
6be7cdda66 Move source code inside a src subdirectory.
This prevents transformers from being importable simply because the CWD
is the root of the git repository, while not being importable from other
directories. That led to inconsistent behavior, especially in examples.

Once you fetch this commit, in your dev environment, you must run:

    $ pip uninstall transformers
    $ pip install -e .
2019-12-22 14:15:13 +01:00
ced0a94204 Switch test files to the standard test_*.py scheme. 2019-12-22 14:15:13 +01:00
067395d5c5 Move tests outside of library. 2019-12-22 13:47:17 +01:00
698f9e3d7a Remove trailing whitespace in README. 2019-12-22 13:29:58 +01:00
c11b3e2926 Sort imports for optional third-party libraries.
These libraries aren't always installed in the virtual environment where
isort is running. Declaring them properly avoids mixing these
third-party imports with local imports.
2019-12-22 11:19:13 +01:00
2a34d5b71b Stabilize import order for packaging.
I don't want to consider it a dependency of transformers, but it's
usually there in local development and usually not there in CI.
2019-12-22 11:07:31 +01:00
c9270086ea Disable flake8 F841 in CI to get a passing run.
I'll fix it later.
2019-12-22 11:00:06 +01:00
577a03664d Enforce flake8 in CI. 2019-12-22 11:00:04 +01:00
7c6812645a Restore proper import for HTTPError. 2019-12-22 10:59:08 +01:00
939148b050 Fix F401 flake8 warning (x28).
Do manually what autoflake couldn't manage.
2019-12-22 10:59:08 +01:00
783a616999 Fix F401 flake8 warning (x88 / 116).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
80327a13ea Fix F401 flake8 warning (x152 / 268).
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
2019-12-22 10:59:08 +01:00
654e051e2a Ignore F401 flake8 warning (x326 / 594). 2019-12-22 10:59:08 +01:00
fa2ccbc081 Fix E266 flake8 warning (x90). 2019-12-22 10:59:08 +01:00
2ab78325f0 Fix F821 flake8 warning (x47).
Ignore warnings related to Python 2, because it's going away soon.
2019-12-22 10:59:07 +01:00
631be27078 Fix E722 flake8 warnings (x26). 2019-12-22 10:59:07 +01:00
b0f7db73cd Fix E741 flake8 warning (x14). 2019-12-22 10:59:07 +01:00
ea89bec185 Fix E231 flake8 warning (x9). 2019-12-22 10:59:07 +01:00
fd2f17a7a1 Fix E714 flake8 warning (x8). 2019-12-22 10:59:07 +01:00
5eab3cf6bc Fix W605 flake8 warning (x5). 2019-12-22 10:59:07 +01:00
7dce8dc7ac Fix E731 flake8 warning (x3). 2019-12-22 10:59:07 +01:00
eed46f38b7 Fix E302 flake8 warning (x3). 2019-12-22 10:59:07 +01:00
b1de7ae08a Fix F811 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
357db7098c Fix E712 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
f9c5317db2 Fix E265 flake8 warning (x1). 2019-12-22 10:59:07 +01:00
28e608a2c2 Remove trailing whitespace from all Python files.
Fixes flake8 warning W291 (x224).
2019-12-22 10:59:07 +01:00
1efa0a7552 Add black-compatible flake8 configuration. 2019-12-22 10:59:07 +01:00
d0c9fe277a Fix circular import in transformers.pipelines.
Submodules shouldn't import from their parent in general.
2019-12-22 10:59:07 +01:00
5ca054757f Update "make style" to sort imports with isort. 2019-12-22 10:59:07 +01:00
9e80fc7b2f Enforce isort in CI.
We need https://github.com/timothycrosley/isort/pull/1000 but there's no
release with this fix yet, so we'll install from GitHub.
2019-12-22 10:59:00 +01:00
158e82e061 Sort imports with isort.
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
2019-12-22 10:57:46 +01:00
9d00f78f16 fix doc link 2019-12-22 16:07:05 +09:00
b668a740ca Fixing incorrect link in model docstring
The docstring contains a link to Salesforce/CTRL repo, while the model itself is Facebookresearch/mmbt. It may be the wrong copy\paste.
2019-12-22 00:01:14 +03:00
bc1715c1e0 Add black-compatible isort configuration.
lines_after_imports = 2 is a matter of taste; I like it.
2019-12-21 17:53:18 +01:00
36883c1192 Add "make style" to format code with black. 2019-12-21 17:53:18 +01:00
6e5291a915 Enforce black in CI. 2019-12-21 17:53:18 +01:00
fa84ae26d6 Reformat source code with black.
This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
2019-12-21 17:52:29 +01:00
63e3827c6b Remove empty file.
Likely it was added by accident.
2019-12-21 15:38:08 +01:00
645713e2cb Merge pull request #2254 from huggingface/fix-tfroberta
adding positional embeds masking to TFRoBERTa
2019-12-21 15:33:22 +01:00
73f6e9817c Merge pull request #2115 from suvrat96/add_mmbt_model
[WIP] Add MMBT Model to Transformers Repo
2019-12-21 15:26:08 +01:00
77676c27d2 adding positional embeds masking to TFRoBERTa 2019-12-21 15:24:48 +01:00
344126fe58 move example to mm-imdb folder 2019-12-21 15:06:52 +01:00
5b7fb6a4a1 Merge pull request #2134 from bkkaggle/saving-and-resuming
closes #1960 Add saving and resuming functionality for remaining examples
2019-12-21 15:03:53 +01:00
6f68d559ab Merge pull request #2130 from huggingface/ignored-index-coherence
[BREAKING CHANGE] Setting all ignored index to the PyTorch standard
2019-12-21 14:55:40 +01:00
1ab25c49d3 Merge branch 'master' into pr/2115 2019-12-21 14:54:30 +01:00
b03872aae0 fix merge 2019-12-21 14:49:54 +01:00
518ba748e0 Merge branch 'master' into saving-and-resuming 2019-12-21 14:41:39 +01:00
18601c3b6e Merge pull request #2173 from erenup/master
run_squad with roberta
2019-12-21 14:33:16 +01:00
6e7102cfb3 Merge pull request #2203 from gthb/patch-1
fix: wrong architecture count in README
2019-12-21 14:31:44 +01:00
deceb00161 Merge pull request #2177 from mandubian/issue-2106
:zip: #2106 tokenizer.tokenize speed improvement (3-8x) by caching added_tokens in a Set
2019-12-21 14:31:20 +01:00
eeb70cdd77 Merge branch 'master' into saving-and-resuming 2019-12-21 14:29:59 +01:00
ed9b84816e Merge pull request #1840 from huggingface/generation_sampler
[WIP] Sampling sequence generator for transformers
2019-12-21 14:27:35 +01:00
f86ed23189 update doc 2019-12-21 14:13:06 +01:00
cfa0380515 Merge branch 'master' into generation_sampler 2019-12-21 14:12:52 +01:00
300ec3003c fixing run_generation example - using torch.no_grad 2019-12-21 14:02:19 +01:00
1c37746892 fixing run_generation 2019-12-21 13:52:49 +01:00
7e17f09fb5 Merge pull request #1803 from importpandas/fix-xlnet-squad2.0
fix run_squad.py during fine-tuning xlnet on squad2.0
2019-12-21 13:38:48 +01:00
8a2be93b4e fix merge 2019-12-21 13:31:28 +01:00
562f864038 Merge branch 'master' into fix-xlnet-squad2.0 2019-12-21 12:48:10 +01:00
8618bf15d6 Merge pull request #1736 from huggingface/fix-tf-xlnet
Fix TFXLNet
2019-12-21 12:42:05 +01:00
2fa8737c44 Merge pull request #1586 from enzoampil/include_special_tokens_in_bert_examples
Add special tokens to documentation for bert examples to resolve issue: #1561
2019-12-21 12:36:11 +01:00
f15f087143 Merge pull request #1764 from DomHudson/bug-fix-1761
Bug-fix: Roberta Embeddings Not Masked
2019-12-21 12:13:27 +01:00
fae4d1c266 Merge pull request #2217 from aaugustin/test-parallelization
Support running tests in parallel
2019-12-21 11:54:23 +01:00
b8e924e10d Restore test.
This looks like debug code accidentally committed in b18509c2.

Refs #2250.
2019-12-21 08:50:15 +01:00
767bc3ca68 Fix typo in model name.
This looks like a copy/paste mistake. Probably this test was never run.

Refs #2250.
2019-12-21 08:46:26 +01:00
343c094f21 Run examples separately from tests.
This optimizes the total run time of the Circle CI test suite.
2019-12-21 08:43:19 +01:00
80caf79d07 Prevent excessive parallelism in PyTorch.
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.

PyTorch performance craters without this.
2019-12-21 08:43:19 +01:00
bb3bfa2d29 Distribute tests from the same file to the same worker.
This should prevent two issues:

- hitting API rate limits for tests that hit the HF API
- multiplying the cost of expensive test setups
2019-12-21 08:43:19 +01:00
29cbab98f0 Parallelize tests on Circle CI.
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).

Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
2019-12-21 08:43:19 +01:00
a4c9338b83 Prevent parallel downloads of the same file with a lock.
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
2019-12-21 08:43:19 +01:00
b670c26684 Take advantage of the cache when running tests.
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.

Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.

Fix #2222.
2019-12-21 08:43:19 +01:00
b67fa1a8d2 Download models directly to cache_dir.
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.

Refs #2222.
2019-12-21 08:43:19 +01:00
286d5bb6b7 Use a random temp dir for writing pruned models in tests. 2019-12-21 08:43:19 +01:00
478e456e83 Use a random temp dir for writing file in tests. 2019-12-21 08:43:19 +01:00
12726f8556 Remove redundant torch.jit.trace in tests.
This looks like it could be expensive, so don't run it twice.
2019-12-21 08:43:19 +01:00
ac1b449cc9 [doc] move distilroberta to more appropriate place
cc @lysandrejik
2019-12-21 00:09:01 -05:00
3e52915fa7 [RoBERTa] Embeddings: fix dimensionality bug 2019-12-20 19:01:27 -05:00
228f52867c Bug fix: 1764 2019-12-20 18:27:35 -05:00
a80778f40e small refactoring (only esthetic, not functional) 2019-12-20 17:21:24 -05:00
3df1d2d144 - Create the output directory (whose name is passed by the user in the "save_directory" parameter) where it will be saved encoder and decoder, if not exists.
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
2019-12-20 17:21:24 -05:00
a436574bfd Release: v2.3.0 2019-12-20 16:22:20 -05:00
d0f8b9a978 Merge pull request #2244 from huggingface/fix-tok-pipe
Fix Camembert and XLM-R `decode` method- Fix NER pipeline alignement
2019-12-20 22:10:39 +01:00
a557836a70 Merge pull request #2191 from huggingface/fix_sp_np
Numpy compatibility for sentence piece
2019-12-20 22:08:08 +01:00
655fd06853 clean up 2019-12-20 21:57:49 +01:00
e5812462fc clean up debug and less verbose tqdm 2019-12-20 21:51:48 +01:00
4775ec354b add overwrite - fix ner decoding 2019-12-20 21:47:15 +01:00
cb6d54bfda Numpy compatibility for sentence piece
convert to int earlier
2019-12-20 15:06:28 -05:00
f79a7dc661 fix NER pipeline 2019-12-20 20:57:45 +01:00
a241011057 fix pipeline NER 2019-12-20 20:43:48 +01:00
e37ca8e11a fix camembert and XLM-R tokenizer 2019-12-20 20:43:42 +01:00
ceae85ad60 fix mc loading 2019-12-20 19:52:24 +01:00
71883b6ddc update link in readme 2019-12-20 19:40:23 +01:00
8d5a47c79b Merge pull request #2243 from huggingface/fix-xlm-roberta
fixing xlm-roberta tokenizer max_length and automodels
2019-12-20 19:34:08 +01:00
79e4a6a25c update serving API 2019-12-20 19:33:12 +01:00
bbaaec046c fixing CLI pipeline 2019-12-20 19:19:20 +01:00
1c12ee0e55 fixing xlm-roberta tokenizer max_length and automodels 2019-12-20 18:28:27 +01:00
65c75fc587 Clean special tokens test 2019-12-20 11:34:16 -05:00
fb393ad994 Added test for all special tokens 2019-12-20 11:29:58 -05:00
90debb9ff2 Keep even the first of the special tokens intact while lowercasing. 2019-12-20 11:29:43 -05:00
b98ff88544 Added pipelines quick tour in README 2019-12-20 15:52:50 +01:00
3a2c4e6f63 Merge pull request #1548 from huggingface/cli
[2.2] - Command-line interface - Pipeline class
2019-12-20 15:28:29 +01:00
4e3f745ba4 add example for Model2Model in quickstart 2019-12-20 09:12:31 -05:00
db0795b5d0 defaults models for tf and pt - update tests 2019-12-20 15:07:00 +01:00
7f74084528 Fix leading axis added when saving through the command run 2019-12-20 14:47:04 +01:00
c37815f130 clean up PT <=> TF 2.0 conversion and config loading 2019-12-20 14:35:40 +01:00
73fcebf7ec update serving command 2019-12-20 13:47:35 +01:00
59941c5d1f Merge pull request #2189 from stefan-it/xlmr
Add support for XLM-RoBERTa
2019-12-20 13:26:38 +01:00
15dda5ea32 remove python 2 tests for circle-ci cc @aaugustin @julien-c @LysandreJik 2019-12-20 13:20:41 +01:00
01ffc65e9b update tests to remove unittest.patch 2019-12-20 13:16:23 +01:00
825697cad4 fix tests 2019-12-20 12:51:10 +01:00
1fa93ca1ea Clean up framework handling 2019-12-20 12:34:19 +01:00
ca6bdb28f6 fix pipelines and rename model_card => modelcard 2019-12-20 12:10:40 +01:00
61d9ee45e3 All tests are green. 2019-12-20 11:47:56 +01:00
ff36e6d8d7 Merge pull request #2231 from huggingface/requests_user_agent
[http] customizable requests user-agent
2019-12-20 10:28:10 +01:00
e516a34a15 Use BasicTokenizer to split over whitespaces. 2019-12-20 09:38:08 +01:00
9d0d1cd339 Filter out entity for NER task. 2019-12-20 09:30:37 +01:00
15d897ff4a [http] customizable requests user-agent 2019-12-19 18:29:22 -05:00
f25e9b6f77 [hf_bucket_url] support for cloudfront urls 2019-12-19 18:28:17 -05:00
a5a06a851e [doc] Param name consistency 2019-12-19 16:24:20 -05:00
1718fb9e74 Minor/basic text fixes (#2229)
* Small clarification

Matches line 431 to line 435 for additional clarity and consistency.

* Fixed minor typo

The letter "s" was previously omitted from the word "docstrings".
2019-12-19 16:23:18 -05:00
9a399ead25 Revert incorrect #1778 2019-12-19 15:45:48 -05:00
3376adc051 configuration/modeling/tokenization: add various fine-tuned XLM-RoBERTa models for English, German, Spanish and Dutch (CoNLL datasets) 2019-12-19 21:30:23 +01:00
e4baa68ddb tick-tock cc @julien-c 2019-12-19 20:37:26 +01:00
149dc376aa fix tests 2019-12-19 20:34:28 +01:00
407093b3fa Merge branch 'cli' of https://github.com/huggingface/transformers into cli 2019-12-19 20:26:51 +01:00
c7be096c39 Merge branch 'master' into cli 2019-12-19 20:26:08 +01:00
a305067f2d Removed __main__ 2019-12-19 19:41:48 +01:00
3492a6ec17 Addressing Thom's comments. 2019-12-19 19:06:44 +01:00
33adab2b91 Fix albert example 2019-12-19 12:40:43 -05:00
a1f1dce0ae Correct max position for SQUAD and TFDS 2019-12-19 12:25:55 -05:00
62c1fc3c1e Removed duplicate XLMConfig, XLMForQuestionAnswering and XLMTokenizer from import statement of run_squad.py script 2019-12-19 09:50:56 -05:00
284572efc0 Updated typo on the link
Updated documentation due to typo
2019-12-19 09:36:43 -05:00
ed6ba93912 corrected typo in example for t5 model input argument 2019-12-19 09:34:55 -05:00
81a911cce5 Doc, doc, ... doc. 2019-12-19 15:12:06 +01:00
faef6f6191 Fix logic order for USE_TF/USE_TORCH 2019-12-19 12:28:17 +01:00
5664327c24 Hide train command for now. 2019-12-19 12:27:54 +01:00
3b29322d4c Expose all the pipeline argument on serve command. 2019-12-19 12:24:17 +01:00
fc624716aa Renaming framework env variables flags from NO_ to USE_ 2019-12-19 11:49:06 +01:00
f516cf3956 Allow pipeline to write output in binary format 2019-12-19 11:42:33 +01:00
d72fa2a0f6 Fix inputs_for_model call in QuestionAnsweringPipeline accessing __dict__ on list. 2019-12-19 10:54:10 +01:00
bcc99fd92e Fix wrong automatic config allocation through AutoConfig 2019-12-19 10:32:21 +01:00
a26ce4dee1 examples: add XLM-RoBERTa to glue script 2019-12-19 02:23:01 +01:00
ec5d6c6a70 Adressing issue with NER task omitting first and last word. 2019-12-19 00:12:10 +01:00
fe9aab1055 tokenization: use S3 location for XLM-RoBERTa model 2019-12-18 23:47:48 +01:00
5c5f67a256 modeling: use S3 location for XLM-RoBERTa model 2019-12-18 23:47:00 +01:00
db90e12114 configuration: use S3 location for XLM-RoBERTa model 2019-12-18 23:46:33 +01:00
d0724d0794 Add PipedPipelineDataFormat 2019-12-18 23:27:26 +01:00
7711403bbd Expose config through the cli arguments 2019-12-18 22:59:51 +01:00
8bb166db5d Expose more information in the output of TextClassificationPipeline 2019-12-18 22:53:19 +01:00
f09d999641 docs: fix numbering 😅 2019-12-18 19:49:33 +01:00
dd7a958fd6 docs: add XLM-RoBERTa to pretrained model list (incl. all parameters) 2019-12-18 19:45:46 +01:00
d35405b7a3 docs: add XLM-RoBERTa to index page 2019-12-18 19:45:10 +01:00
3e89fca543 readme: add XLM-RoBERTa to model architecture list 2019-12-18 19:44:23 +01:00
128cfdee9b tokenization add XLM-RoBERTa base model 2019-12-18 19:28:16 +01:00
e778dd854d modeling: add XLM-RoBERTa base model 2019-12-18 19:27:34 +01:00
04b602f96f Put module import on top of the module. 2019-12-18 18:28:39 +01:00
64a971a915 auto: add XLM-RoBERTa to auto tokenization 2019-12-18 18:24:32 +01:00
036831e279 auto: add XLM-RoBERTa to audo modeling 2019-12-18 18:23:42 +01:00
41a13a6375 auto: add XLMRoBERTa to auto configuration 2019-12-18 18:20:27 +01:00
0c88c856d5 Unnest QuestionAnsweringArgumentHandler 2019-12-18 18:18:16 +01:00
8efc6dd544 fix #2214 2019-12-18 10:47:59 -05:00
a2978465a2 Merge branch 'master' into patch-1 2019-12-18 14:54:46 +00:00
01b68be34f converter: remove XLM-RoBERTa specific script (can be done with the script for RoBERTa now) 2019-12-18 12:24:46 +01:00
3d2096f516 further cleanup 2019-12-18 11:50:54 +01:00
ca31abc6d6 tokenization: *align* fairseq and spm vocab to fix some tokenization errors 2019-12-18 11:36:54 +01:00
8e5587fb79 few fixes on sampling 2019-12-18 11:32:37 +01:00
cce3089b65 Merge remote-tracking branch 'upstream/master' into xlmr 2019-12-18 11:05:16 +01:00
641a8decdc clean up code and add arbitrary number of return sequences 2019-12-18 10:43:48 +01:00
e347725d8c More fine-grained control over pipeline creation with config argument. 2019-12-18 10:41:24 +01:00
94c99db34c [FinBERT] fix incorrect url 2019-12-17 20:35:25 -05:00
7ffa817390 [s3] mv files and update links 2019-12-17 20:35:25 -05:00
c5f35e61db Uploaded files to AWS. 2019-12-17 20:35:25 -05:00
abc43ffbff Add pretrained model documentation for FinBERT. 2019-12-17 20:35:25 -05:00
8ac840ff87 Adding Finnish BERT. 2019-12-17 20:35:25 -05:00
a0d386455b Fix outdated tokenizer doc 2019-12-17 20:07:39 -05:00
ea636440d1 [roberta.conversion] Do not hardcode vocab size
and support for fairseq 0.9+
2019-12-17 18:12:22 -05:00
a4df2e0113 update roberta conversion
- update to fix conversion for the updated fairseq model
- create save directory if not exist
2019-12-17 18:12:22 -05:00
77d397202b clean up dead code 2019-12-17 23:28:46 +01:00
bbc0c86f9b beam search + single beam decoding 2019-12-17 23:27:02 +01:00
5e289f69bc regex 2019.12.17 install fails with Python 2 2019-12-17 15:54:05 -05:00
2cff4bd8f3 Fix segmentation fault 2019-12-17 15:54:05 -05:00
55397dfb9b CsvPipelineDataFormat: Fix for single-column 2019-12-17 13:10:51 -05:00
b6938916ac adding beam search 2019-12-17 17:23:36 +01:00
d303f84e7b fix: wrong architecture count in README
Just say “the following” so that this intro doesn't so easily fall out of date :) )
2019-12-17 16:18:00 +00:00
2fde5a2489 Initial bunch of documentation. 2019-12-17 12:16:07 +01:00
2f1c745cde update conversion script 2019-12-17 11:47:54 +01:00
83bc5235cf Merge branch 'master' into pr/2189 2019-12-17 11:47:32 +01:00
d7c62661a3 Provide serving dependencies for tensorflow and pytorch (serving-tf, serving-torch) 2019-12-17 11:23:39 +01:00
f349826a57 model: fix cls and sep token for XLM-RoBERTa documentation 2019-12-17 10:36:04 +01:00
f061606277 Merge pull request #2164 from huggingface/cleanup-configs
[SMALL BREAKING CHANGE] Cleaning up configuration classes - Adding Model Cards
2019-12-17 09:10:16 +01:00
805c21aeba tried to fix the failed checks 2019-12-17 11:36:00 +08:00
d000195ee6 add comment for example_index and unique_id in single process 2019-12-17 11:28:34 +08:00
3c6efd0ca3 updated usage example in modeling_roberta for question and answering 2019-12-17 11:18:12 +08:00
3f5ccb183e [doc] Clarify uploads
cf 855ff0e91d (commitcomment-36452545)
2019-12-16 18:20:29 -05:00
3cb51299c3 Fix #2109 2019-12-16 16:58:44 -05:00
18a879f475 fix #2180 2019-12-16 16:44:29 -05:00
d803409215 Fix run squad evaluate during training 2019-12-16 16:31:38 -05:00
a468870fd2 refactoring generation 2019-12-16 22:22:30 +01:00
855ff0e91d [doc] Model upload and sharing
ping @lysandrejik @thomwolf

Is this clear enough? Anything we should add?
2019-12-16 12:42:22 -05:00
d064009b72 converter: fix vocab size 2019-12-16 17:23:25 +01:00
a701a0cee1 configuration: fix model name for large XLM-RoBERTa model 2019-12-16 17:17:56 +01:00
59a1aefb1c tokenization: add support for new XLM-RoBERTa model. Add wrapper around fairseq tokenization logic 2019-12-16 17:00:55 +01:00
69f4f058fa model: add support for new XLM-RoBERTa model 2019-12-16 17:00:12 +01:00
a648ff738c configuration: add support for XLM-RoBERTa model 2019-12-16 16:47:39 +01:00
9ed09cb4a3 converter: add conversion script for original XLM-RoBERTa weights to Transformers-compatible weights 2019-12-16 16:46:58 +01:00
d3549b66af module: add support for XLM-RoBERTa (__init__) 2019-12-16 16:38:39 +01:00
a096e2a88b WIP serving through HTTP internally using pipelines. 2019-12-16 16:38:02 +01:00
71b4750517 examples: add support for XLM-RoBERTa to run_ner script 2019-12-16 16:37:27 +01:00
43a4e1bbe4 Adressing issue in varargs handling for question answering. 2019-12-16 16:00:41 +01:00
46ccbb42fc Make CLI run command use integer mapping for device argument. 2019-12-16 15:49:41 +01:00
bbc707cf39 Fix non-keyworded varargs handling in DefaultArgumentHandler for pipeline. 2019-12-16 15:49:09 +01:00
9c391277cc Allow tensors placement on specific device through CLI and pipeline. 2019-12-16 15:19:13 +01:00
1bbdbacd5b update __init__ and saving 2019-12-16 14:38:20 +01:00
955d7ecb57 Refactored Pipeline with dedicated argument handler. 2019-12-16 14:34:54 +01:00
031ad4eb37 improving JSON error messages (for model card and configurations) 2019-12-16 14:20:57 +01:00
db0a9ee6e0 adding albert to TF auto models cc @LysandreJik 2019-12-16 14:08:08 +01:00
a4d07b983a dict of all config and model files cc @LysandreJik 2019-12-16 14:00:32 +01:00
d3418a94ff update tests 2019-12-16 13:52:41 +01:00
56e98ba81a add model cards cc @mfuntowicz 2019-12-16 11:07:27 +01:00
8669598abd update t5 tf 2019-12-16 09:59:36 +01:00
1b8613acb3 updating t5 config class 2019-12-16 09:51:42 +01:00
8e3b1c860f Added FeatureExtraction pipeline. 2019-12-15 01:37:52 +01:00
f1971bf303 Binding pipelines to the cli. 2019-12-15 01:37:16 +01:00
cc0135134b :zip: #2106 basic tokenizer.tokenize global speed improvement (3-8x) by simply caching added_tokens in a Set 2019-12-14 15:25:13 +01:00
dc667ce1a7 double check cc @LysandreJik 2019-12-14 09:56:27 +01:00
7140363e09 update bertabs 2019-12-14 09:44:53 +01:00
a52d56c8d9 Merge branch 'master' into cleanup-configs 2019-12-14 09:43:07 +01:00
e92bcb7eb6 Merge pull request #1739 from huggingface/t5
[WIP] Adding Google T5 model
2019-12-14 09:40:43 +01:00
cbb368ca06 distilbert tests 2019-12-14 09:31:18 +01:00
b6d4284b26 [cli] Uploads: fix + test edge case 2019-12-13 22:44:57 -05:00
a1faaf9962 deleted useless file 2019-12-14 08:57:13 +08:00
c7780700f5 Merge branch 'refs/heads/squad_roberta'
# Conflicts:
#	transformers/data/processors/squad.py
2019-12-14 08:53:59 +08:00
76f0d99f02 Merge remote-tracking branch 'refs/remotes/huggingface/master' 2019-12-14 08:45:17 +08:00
8e9526b4b5 add multiple processing 2019-12-14 08:43:58 +08:00
7bd11dda6f Release: v2.2.2 2019-12-13 16:45:30 -05:00
c3248cf122 Tests for all tokenizers 2019-12-13 16:41:44 -05:00
f2ac50cb55 better for python2.x 2019-12-13 16:41:44 -05:00
4cbdc7d910 missed space 2019-12-13 16:41:44 -05:00
dd2add9f6e more tests 2019-12-13 16:41:44 -05:00
df160af736 🐛 #2096 in tokenizer.decode, space is not joined between all subtexts instead of before added tokens 2019-12-13 16:41:44 -05:00
5b7b78e088 🐛 #2096 in tokenizer.decode, adds a space after special tokens to return right formatted string 2019-12-13 16:41:44 -05:00
866d73ca26 [cli] Upload is now compatible with folders 2019-12-13 16:39:08 -05:00
d461472948 return for SQuAD [BLACKED] 2019-12-13 15:31:52 -05:00
f24a228a93 Speed up tokenization process 2019-12-13 14:50:35 -05:00
c8ed1c82c8 [SQUAD] Load checkpoint when evaluating without training 2019-12-13 12:13:48 -05:00
5c00e344c1 update model doc - swith 3B/11B to 3b/11b 2019-12-13 16:33:29 +01:00
0b51532ce9 Reintroducing the batch_encode_plus method 2019-12-13 16:22:50 +01:00
110394b2ba Merge branch 'master' into t5 2019-12-13 16:03:32 +01:00
5a5c4349e8 Fix summarization to_cpu doc 2019-12-13 10:02:33 -05:00
8ade204098 fix tf 2019-12-13 14:48:47 +01:00
47f0e3cfb7 cleaning up configuration classes 2019-12-13 14:33:24 +01:00
8938b546bf Removed from_config 2019-12-13 14:27:04 +01:00
1ca52567a4 Allow model conversion in the pipeline allocator. 2019-12-13 14:13:14 +01:00
28e64ad5a4 Raise an exception if the pipeline allocator can't determine the tokenizer from the model. 2019-12-13 14:12:54 +01:00
be5bf7b81b Added NER pipeline. 2019-12-13 14:12:17 +01:00
80eacb8f16 Adding labels mapping for classification models in their respective config. 2019-12-13 14:10:22 +01:00
33e72b08d5 fix inner dimensions for 3B/11B models 2019-12-13 11:33:05 +01:00
9b312f9d41 initial version for roberta squad 2019-12-13 14:51:40 +08:00
40ed717232 Merge remote-tracking branch 'refs/remotes/huggingface/master' 2019-12-13 09:10:17 +08:00
7296f1010b Cleanup squad and add allow train_file and predict_file usage 2019-12-12 13:01:04 -05:00
5d67aa21ae [doc] Replicate doc from #2144 2019-12-12 12:39:41 -05:00
3fd71c4431 Update example scripts 2019-12-12 12:08:54 -05:00
fe92755b99 Fix special tokens mask in encode 2019-12-12 11:37:19 -05:00
fbf5455a86 Fix typo in examples/run_glue.py args declaration.
deay -> decay
2019-12-12 11:16:19 -05:00
f19dad61c7 fixing XLM conversion tests with dummy input 2019-12-12 14:46:30 +01:00
f69dbecc38 Expose classification labels mapping (and reverse) in model config. 2019-12-12 10:25:36 +01:00
90df44f0aa Merge pull request #2063 from guillaume-be/special_tokens_mask_value_not_used
special_tokens_mask value was unused and calculated twice
2019-12-12 08:21:46 +01:00
707f9e9241 Merge pull request #2081 from pglock/patch-1
handle string with only whitespaces as empty
2019-12-12 08:20:43 +01:00
137e20a846 Merge pull request #2075 from huggingface/check-link-validity
Check link validity
2019-12-12 08:09:12 +01:00
d5712f7cac Merge branch 'master' into check-link-validity 2019-12-12 08:00:51 +01:00
9c58b236ef Merge pull request #2144 from huggingface/from-pretrained-from-url
Allowing from_pretrained to load from url directly
2019-12-12 07:43:40 +01:00
413f41921b fix merge 2019-12-12 07:34:42 +01:00
386a93f0f8 Merge branch 'master' into from-pretrained-from-url 2019-12-12 07:31:05 +01:00
2d103546ef Merge pull request #2148 from huggingface/fix_encode_plus
Fix encode plus
2019-12-12 07:24:47 +01:00
1748fdf657 [doc] Fix rst table 2019-12-11 18:32:27 -05:00
36fc52a3b4 Update links to weights 2019-12-11 18:32:27 -05:00
371c5ddfad Py2 tests for Lysandre 2019-12-11 18:32:27 -05:00
5505cf7014 Run tests on Py2 too, for Lysandre 2019-12-11 18:32:27 -05:00
9cb97c0c0f Actually run the tests 2019-12-11 18:32:27 -05:00
95854c4a2f Actually run the tests 2019-12-11 18:32:27 -05:00
d2100428d3 Update to new test infra and only run conditionally 2019-12-11 18:32:27 -05:00
597ba7feb3 Support testing Japanese BERT tokenizers 2019-12-11 18:32:27 -05:00
6a43dc9d7d Support Python 2 2019-12-11 18:32:27 -05:00
a09da4eeb0 Add a test for Japanese BERT tokenizers 2019-12-11 18:32:27 -05:00
57b5cb3eaa Fix loading BertJapaneseTokenizer 2019-12-11 18:32:27 -05:00
c03c0dfd23 Add support for Japanese BERT models by cl-tohoku 2019-12-11 18:32:27 -05:00
4f15e5a267 Add tests.
Maybe not the best possible place for the tests, lmk.
2019-12-11 17:41:51 -05:00
18e1f751f1 TF support 2019-12-11 17:07:46 -05:00
31e5b5ff22 Fix tests + first example of doc 2019-12-11 15:22:02 -05:00
3d57c51111 Fix encode plus 2019-12-11 15:10:17 -05:00
c999a3e505 Allow from_pretrained to take a remote identifier 2019-12-11 12:29:58 -05:00
030faccb8d doc: fix pretrained models table 2019-12-11 12:19:21 -05:00
6709739a05 allowing from_pretrained to load from url directly 2019-12-11 18:15:45 +01:00
29570db25b allowing from_pretrained to load from url directly 2019-12-11 17:19:18 +01:00
2e2f9fed55 rm duplicate imports 2019-12-11 11:11:56 -05:00
c28273793e Add missing DistilBert and Roberta to AutoModelForTokenClassification 2019-12-11 15:31:45 +01:00
4c12860f7a Remove misleading documentation 2019-12-11 09:22:37 -05:00
b040bff6df Added supported model to AutoModelTokenClassification 2019-12-11 14:13:58 +01:00
fafd4c86ec fix TF 2.0 version of T5 - update conversion script 2019-12-11 13:47:27 +01:00
6aa919469d Update run_xnli to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
89896fe04f Update run_ner to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
fdc05cd68f Update run_squad to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:31:22 -06:00
854ec5784e Update run_glue to save optimizer and scheduler states, then resume training from a checkpoint 2019-12-10 19:30:36 -06:00
9a24e0cf76 Refactored qa pipeline argument handling + unittests 2019-12-11 00:33:25 +01:00
b72f9d340e Correct index in script 2019-12-10 18:33:17 -05:00
51ae203290 Merge pull request #2129 from leopd/master
Progress indicator improvements when downloading pre-trained models.
2019-12-10 22:18:55 +01:00
ec6fb25c21 Patch documentation 2019-12-10 15:49:20 -05:00
418589244d Uniforming the ignored indices 2019-12-10 15:26:19 -05:00
58d75aa310 Progress indicator improvements when downloading pre-trained models. 2019-12-10 11:36:56 -08:00
6a73382706 Complete warning + cleanup 2019-12-10 14:33:24 -05:00
dc4e9e5cb3 DataParallel for SQuAD + fix XLM 2019-12-10 19:21:20 +00:00
67a8be8e90 fix backward in tests 2019-12-10 17:50:32 +01:00
07bc8efbc3 add greedy decoding and sampling 2019-12-10 17:27:50 +01:00
63e36007ee Make sure padding, cls and another non-context tokens cannot appear in the answer. 2019-12-10 16:47:35 +01:00
f2538c1274 all tests in torch no grad 2019-12-10 16:33:11 +01:00
a5df980c5b updating distilbert test 2019-12-10 16:01:15 +01:00
40a39ab650 Reuse recent SQuAD refactored data structure inside QA pipelines. 2019-12-10 15:59:38 +01:00
7c3a15ace9 Merge branch 'master' into t5 2019-12-10 15:36:54 +01:00
981a5c8c17 updating models urls 2019-12-10 15:36:19 +01:00
e6cff60b4c Merge pull request #2069 from huggingface/cleaner-pt-tf-conversion
clean up PT <=> TF conversion
2019-12-10 15:34:08 +01:00
4b82c485de remove misplaced summarization documentation 2019-12-10 09:13:33 -05:00
8ae1044f80 updating tests and TF 2.0 model 2019-12-10 15:11:07 +01:00
aae74065df Added QuestionAnsweringPipeline unit tests. 2019-12-10 13:37:20 +01:00
a7d3794a29 Remove token_type_ids for compatibility with DistilBert 2019-12-10 13:37:20 +01:00
fe0f552e00 Use attention_mask everywhere. 2019-12-10 13:37:20 +01:00
348e19aa21 Expose attention_masks and input_lengths arguments to batch_encode_plus 2019-12-10 13:37:18 +01:00
c2407fdd88 Enable the Tensorflow backend. 2019-12-10 13:37:14 +01:00
f116cf599c Allow hidding frameworks through environment variables (NO_TF, NO_TORCH). 2019-12-10 13:37:07 +01:00
6e61e06051 batch_encode_plus generates the encoder_attention_mask to avoid attending over padded values. 2019-12-10 13:37:07 +01:00
02110485b0 Added batching, topk, chars index and scores. 2019-12-10 13:36:55 +01:00
e1d89cb24d Added QuestionAnsweringPipeline with batch support. 2019-12-10 13:36:55 +01:00
0558c9cb9b Merge branch 'master' into t5 2019-12-10 12:58:48 +01:00
81babb227e Added download command through the cli.
It allows to predownload models and tokenizers.
2019-12-10 12:18:59 +01:00
31a3a73ee3 updating CLI 2019-12-10 12:18:59 +01:00
7c1697562a compatibility with sklearn and keras 2019-12-10 12:12:22 +01:00
b81ab431f2 updating AutoModels and AutoConfiguration - adding pipelines 2019-12-10 12:11:33 +01:00
2d8559731a add pipeline - train 2019-12-10 11:34:16 +01:00
72c36b9ea2 [WIP] - CLI 2019-12-10 11:33:14 +01:00
e57d00ee10 Merge pull request #1984 from huggingface/squad-refactor
[WIP] Squad refactor
2019-12-10 11:07:26 +01:00
ecabbf6d28 Merge pull request #2107 from huggingface/encoder-mask-shape
create encoder attention mask from shape of hidden states
2019-12-10 10:07:56 +01:00
608a8f5b56 updating tf 2.0 layer_norm to T5 layer norm 2019-12-10 10:01:01 +01:00
df3961121f Add MMBT Model to Transformers Repo 2019-12-09 18:36:48 -08:00
1d18930462 Harmonize no_cuda flag with other scripts 2019-12-09 20:37:55 -05:00
f7eba09007 clean for release 2019-12-09 20:37:55 -05:00
2a64107e44 improve device usage 2019-12-09 20:37:55 -05:00
c0707a85d2 add README 2019-12-09 20:37:55 -05:00
ade3cdf5ad integrate ROUGE 2019-12-09 20:37:55 -05:00
076602bdc4 prevent BERT weights from being downloaded twice 2019-12-09 20:37:55 -05:00
5909f71028 add py-rouge dependency 2019-12-09 20:37:55 -05:00
a1994a71ee simplified model and configuration 2019-12-09 20:37:55 -05:00
3a9a9f7861 default output dir to documents dir 2019-12-09 20:37:55 -05:00
693606a75c update the docs 2019-12-09 20:37:55 -05:00
c0443df593 remove beam search 2019-12-09 20:37:55 -05:00
2403a66598 give transformers API to BertAbs 2019-12-09 20:37:55 -05:00
4d18199902 cast bool tensor to long for pytorch < 1.3 2019-12-09 20:37:55 -05:00
9f75565ea8 setup training 2019-12-09 20:37:55 -05:00
4735c2af07 tweaks to the BeamSearch API 2019-12-09 20:37:55 -05:00
ba089c780b share pretrained embeddings 2019-12-09 20:37:55 -05:00
9660ba1cbd Add beam search 2019-12-09 20:37:55 -05:00
1c71ecc880 load the pretrained weights for encoder-decoder
We currently save the pretrained_weights of the encoder and decoder in
two separate directories `encoder` and `decoder`. However, for the
`from_pretrained` function to operate with automodels we need to
specify the type of model in the path to the weights.

The path to the encoder/decoder weights is handled by the
`PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice
there is no easy way to infer the type of model that was initialized for
the encoder and decoder we add a parameter `model_type` to the function.
This is not an ideal solution as it is error prone, and the model type
should be carried by the Model classes somehow.

This is a temporary fix that should be changed before merging.
2019-12-09 20:37:55 -05:00
07f4cd73f6 update function to add special tokens
Since I started my PR the `add_special_token_single_sequence` function
has been deprecated for another; I replaced it with the new function.
2019-12-09 20:37:55 -05:00
5c877fe94a fix albert links 2019-12-09 18:53:00 -05:00
79526f82f5 Remove unnecessary epoch variable 2019-12-09 16:24:35 -05:00
9626e0458c Add functionality to continue training from last saved global_step 2019-12-09 16:24:35 -05:00
2d73591a18 Stop saving current epoch 2019-12-09 16:24:35 -05:00
0eb973b0d9 Use saved optimizer and scheduler states if available 2019-12-09 16:24:35 -05:00
a03fcf570d Save tokenizer after each epoch to be able to resume training from a checkpoint 2019-12-09 16:24:35 -05:00
f71b1bb05a Save optimizer state, scheduler state and current epoch 2019-12-09 16:24:35 -05:00
8e651f56b7 fix tf tests 2019-12-09 22:13:57 +01:00
808bb8da7e fix transfo xl tests 2019-12-09 21:48:34 +01:00
b016dd16c9 fix tests on python 3.5 2019-12-09 21:38:07 +01:00
2a4ef098d6 Add ALBERT and XLM to SQuAD script 2019-12-09 10:46:47 -05:00
00c4e39581 Merge branch 'master' into squad-refactor 2019-12-09 10:41:15 -05:00
169fea6855 updating T5 2019-12-09 16:25:33 +01:00
3520be7824 create encoder attention mask from shape of hidden states
We currently create encoder attention masks (when they're not provided)
based on the shape of the inputs to the encoder. This is obviously
wrong; sequences can be of different lengths. We now create the encoder
attention mask based on the batch_size and sequence_length of the
encoder hidden states.
2019-12-09 11:19:45 +01:00
0cb163865a Remove pytest dependency. (#2093) 2019-12-07 07:46:14 -05:00
2670b0d682 Fix bug which lowercases special tokens 2019-12-06 16:15:53 -05:00
35401fe50f Remove dependency on pytest for running tests (#2055)
* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda
2019-12-06 13:57:38 -05:00
e4679cddce [cli] Uploads: add progress bar (#2078)
* [cli] Uploads: add progress bar

see https://github.com/huggingface/transformers/pull/2044#discussion_r354057827 for context

* rename + documentation

* Add auto-referential comment
2019-12-06 11:56:23 -05:00
1d87b37d10 updating 2019-12-06 15:30:09 +01:00
4cb9b60558 Merge pull request #2077 from patrickvonplaten/change_documentation_for_past_output_shape
corrected documentation for past tensor shape for ctrl and gpt2 model
2019-12-06 12:14:48 +01:00
5482822a2b Merge pull request #2046 from jplu/tf2-ner-example
Add NER TF2 example.
2019-12-06 12:12:22 +01:00
fc1bb1f867 Merge pull request #2068 from huggingface/fix-2042
Nicer error message when Bert's input is missing batch size
2019-12-06 12:06:42 +01:00
21451ec6ba handle string with only whitespaces as empty 2019-12-06 10:32:43 +01:00
f230d91b43 check the validity of links
We add a script and a CI workflow to check that all download links
present in the source code are valid.
2019-12-06 09:41:28 +01:00
d0383e4daf corrected documentation for past tensor shape for ctrl and gpt2 model 2019-12-06 01:24:22 +01:00
e9217da5ff Cleanup
Improve global visibility on the run_squad script, remove unused files and fixes related to XLNet.
2019-12-05 16:01:51 -05:00
9ecd83dace Patch evaluation for impossible values + cleanup 2019-12-05 14:44:57 -05:00
35ff345fc9 update requirements 2019-12-05 12:07:04 -05:00
552c44a9b1 release distilm-bert 2019-12-05 10:14:58 -05:00
ee53de7aac Pr for pplm (#2060)
* license

* changes

* ok

* Update paper link and commands to run

* pointer to uber repo
2019-12-05 09:20:07 -05:00
f8fb4335c9 clean up a little bit PT <=> TF conversion 2019-12-05 15:19:32 +01:00
bebaa14039 Merge pull request #2045 from aaugustin/remove-dead-code
Remove dead code in tests.
2019-12-05 14:41:56 +01:00
18fb93530b fixing #2042 - Nicer error message 2019-12-05 14:36:34 +01:00
2d5d86e037 fix #2031 2019-12-05 14:06:29 +01:00
af077b15e2 Merge pull request #2065 from huggingface/fixing-camembert
Fixing camembert tokenization
2019-12-05 13:45:44 +01:00
3268ebd229 fix xlnet test 2019-12-05 13:35:29 +01:00
6c5297a423 Fixing camembert tokenization 2019-12-05 13:27:58 +01:00
9200a759d7 Add few tests on the TF optimization file with some info in the documentation. Complete the README. 2019-12-05 12:56:43 +01:00
1f179f095f Merge pull request #2011 from AdityaSoni19031997/patch-1
typo fix on the docs as per Pytorch v1.1+
2019-12-05 12:39:04 +01:00
1eaf44e713 Merge pull request #2007 from roskoN/xlnet_attention_fix
fixed XLNet attention output for both attention streams whenever target_mapping is provided
2019-12-05 12:32:39 +01:00
71e4693f08 fix #1968 2019-12-05 12:14:24 +01:00
f9f395b21c Merge pull request #1735 from ondewo/tf-do-not-use-gpu-on-import
Do not use GPU when importing transformers
2019-12-05 11:56:48 +01:00
75a97af6bc fix #1450 - add doc 2019-12-05 11:26:55 +01:00
8b388827b5 fix #1920 2019-12-05 11:18:43 +01:00
d425a4d60b Merge pull request #1870 from alexzubiaga/xlnet-for-token-classification
XLNet for Token classification
2019-12-05 09:54:09 +01:00
1eb89ddf73 Merge pull request #2044 from huggingface/cli_upload
CLI for authenticated file sharing
2019-12-05 09:44:07 +01:00
7f998b1b83 special_tokens_mask value was unused and calculated twice 2019-12-05 09:01:39 +01:00
fb0d2f1da1 preparing release distil-mBERT 2019-12-05 03:00:16 -05:00
3ba417e1a8 [cli] ls: Tabular formatting 2019-12-04 18:40:52 -05:00
ce158a076f Return dataset (pytorch) 2019-12-04 17:55:52 -05:00
7a03519975 Documentation 2019-12-04 17:24:35 -05:00
96fa9a8a70 Python 2 + Post mime-type to S3 2019-12-04 17:22:50 -05:00
33508ae310 Remove only_first 2019-12-04 16:26:45 -05:00
f7e4a7cdfa Cleanup 2019-12-04 16:24:15 -05:00
a7ca6d738b Padding side is tokenizer-dependant 2019-12-04 15:43:34 -05:00
cca75e7884 Kill the demon spawn 2019-12-04 15:42:29 -05:00
bf119c0568 TFDS dataset can now be evaluated 2019-12-04 11:34:59 -05:00
ff98b041da Fix whitespace issue 2019-12-04 16:53:06 +01:00
9ddc3f1a12 Naming update + XLNet/XLM evaluation 2019-12-04 10:37:00 -05:00
5bfcd0485e fix #1991 2019-12-04 14:53:11 +01:00
cae641ff26 Merge pull request #1846 from tamuhey/patch/iss1845
fix summary_type value of SequenceSummary
2019-12-04 13:28:39 +01:00
254ebb979c Bugfix on init file. Missing comma. 2019-12-04 10:00:25 +01:00
ecb923da9c Create a NER example similar to the Pytorch one. It takes the same options, and can be run the same way. 2019-12-04 09:43:15 +01:00
40255ab002 Remove dead code in tests. 2019-12-04 08:21:02 +01:00
e4fbf3e2cc CLI for authenticated file sharing 2019-12-04 00:52:23 -05:00
de276de1c1 Working evaluation 2019-12-03 17:15:51 -05:00
7edb51f3a5 [pplm] split classif head into its own file 2019-12-03 22:07:25 +00:00
c835bc85c2 Compute predictions 2019-12-03 15:28:16 -05:00
285b1241e3 Added SquadResult 2019-12-03 15:00:49 -05:00
8101924a68 Patch: v2.2.1 2019-12-03 11:20:26 -05:00
48cbf267c9 Use full dataset for eval (SequentialSampler in Distributed setting) 2019-12-03 11:01:37 -05:00
f434bfc623 [pplm] Update S3 links
Co-Authored-By: Piero Molino <w4nderlust@gmail.com>
2019-12-03 10:53:02 -05:00
96e83506d1 Always use SequentialSampler during evaluation
When evaluating, shouldn't we always use the SequentialSampler instead of DistributedSampler? Evaluation only runs on 1 GPU no matter what, so if you use the DistributedSampler with N GPUs, I think you'll only evaluate on 1/N of the evaluation set. That's at least what I'm finding when I run an older/modified version of this repo.
2019-12-03 10:15:39 -05:00
3b48806f75 [pplm] README: add setup + tweaks 2019-12-03 10:14:02 -05:00
0cb2c90890 readme
Co-Authored-By: Rosanne Liu <mimosavvy@gmail.com>
2019-12-03 10:14:02 -05:00
1efb2ae7fc [pplm] move scripts under examples/pplm/ 2019-12-03 10:14:02 -05:00
a59fdd1627 generate_text_pplm now works with batch_size > 1 2019-12-03 10:14:02 -05:00
893d0d64fe Changed order of some parameters to be more consistent. Identical results. 2019-12-03 10:14:02 -05:00
f42816e7fc Added additional check for url and path in discriminator model params 2019-12-03 10:14:02 -05:00
f10b925015 Imrpovements: model_path renamed pretrained_model, tokenizer loaded from pretrained_model, pretrained_model set to discriminator's when discrim is specified, sample = False by default but cli parameter introduced. To obtain identical samples call the cli with --sample 2019-12-03 10:14:02 -05:00
75904dae66 Removed global variable device 2019-12-03 10:14:02 -05:00
7fd54b55a3 Added support for generic discriminators 2019-12-03 10:14:02 -05:00
b0eaff36e6 Added a +1 to epoch when saving weights 2019-12-03 10:14:02 -05:00
611961ade7 Added tqdm to preprocessing 2019-12-03 10:14:02 -05:00
afc7dcd94d Now run_pplm works on cpu. Identical output as before (when using gpu). 2019-12-03 10:14:02 -05:00
61399e5afe Cleaned perturb_past. Identical output as before. 2019-12-03 10:14:02 -05:00
ffc2935405 Fix for making unditioned generation work. Identical output as before. 2019-12-03 10:14:02 -05:00
9f693a0c48 Cleaned generate_text_pplm. Identical output as before. 2019-12-03 10:14:02 -05:00
61a12f790d Renamed SmallConst to SMALL_CONST and introduced BIG_CONST. Identical output as before. 2019-12-03 10:14:02 -05:00
ef47b2c03a Removed commented code. Identical output as before. 2019-12-03 10:14:02 -05:00
7ea12db3f5 Removed commented code. Identical output as before. 2019-12-03 10:14:02 -05:00
08c6e456a3 Cleaned full_text_generation. Identical output as before. 2019-12-03 10:14:02 -05:00
6c9c131780 More cleanup for run_model. Identical output as before. 2019-12-03 10:14:02 -05:00
7ffe47c888 Improved device specification 2019-12-03 10:14:02 -05:00
4f2164e40e First cleanup step, changing function names and passing parameters all the way through without using args. Identical output as before. 2019-12-03 10:14:02 -05:00
821de121e8 Minor changes 2019-12-03 10:14:02 -05:00
7469d03b1c Fixed minor bug when running training on cuda 2019-12-03 10:14:02 -05:00
0b51fba20b Added script for training a discriminator for pplm to use 2019-12-03 10:14:02 -05:00
34a83faabe Let's make PPLM great again 2019-12-03 10:14:02 -05:00
d5faa74cd6 tokenizer white space: revert to previous behavior 2019-12-03 10:14:02 -05:00
0b77d66a6d rm extraneous import 2019-12-03 10:14:02 -05:00
83b1e6ac9e fix the loss backward issue
(cherry picked from commit 566468cc984c6ec7e10dfc62b5b4191781a99cd2)
2019-12-03 10:14:02 -05:00
572c24cfa2 PPLM (squashed)
Co-authored-by: piero <piero@uber.com>
Co-authored-by: Rosanne Liu <mimosavvy@gmail.com>
2019-12-03 10:14:02 -05:00
f19a78a634 Merge pull request #1903 from valohai/master
Valohai integration
2019-12-03 16:13:01 +01:00
d100ad99c0 Merge pull request #2014 from aaugustin/mark-tf-auto-model-test-as-slow
Mark tests in TFAutoModelTest as slow.
2019-12-03 16:03:48 +01:00
66fc8d25a5 Change ref to original GLUE downloader script 2019-12-03 10:49:50 +02:00
fbaf05bd92 Remove annoying tokenization message 2019-12-02 18:23:00 -05:00
e85855f2c4 Fix ALBERT exports with pretraining + sp classifier; Fix naming for ALBERT TF models 2019-12-02 18:00:19 -05:00
b3d834ae11 Reorganize ALBERT conversion script 2019-12-02 15:01:52 -05:00
f3776df0f3 WIP debugging 2019-12-02 15:47:00 +01:00
5ab93083e4 Mark tests in TFAutoModelTest as slow.
Each test forces downloading the same 536MB file, which is slow
even with a decent internet connection.
2019-12-01 18:25:15 +01:00
c356290c8d typo fix as per Pytorch v1.1+ 2019-12-01 14:08:14 +05:30
76c0bc06d5 [XLNet] Changed post-processing of attention w.r.t to target_mapping
Whenever target_mapping is provided to the input, XLNet outputs two different attention streams.
Based on that the attention output would be on of the two:
- a list of tensors (usual case for most transformers)
- a list of 2-tuples of tensors, one tesor for each of attention streams
Docs and unit-tests have been updated
2019-11-30 21:01:04 +01:00
b90791e950 fixed XLNet attenttion output for both attention streams 2019-11-30 15:57:51 +01:00
b0ee7c7df3 Added Camembert to available models 2019-11-29 14:17:02 -05:00
ecf15ebf3b Add ALBERT to AutoClasses 2019-11-29 11:25:37 -05:00
4a666885b5 reducing my level of enthousiasm 2019-11-29 09:40:50 -05:00
adb5c79ff2 update all tf.shape and tensor.shape to shape_list 2019-11-29 09:40:50 -05:00
2421e54f8c Add link to original source and license to download_glue.data.py 2019-11-29 15:39:28 +02:00
41aa0e8003 Refactor logs and fix loss bug 2019-11-29 15:33:25 +02:00
1ab8dc44b3 Merge pull request #1876 from huggingface/mean-fix
Mean does not exist in TF2
2019-11-29 09:26:33 +01:00
f0d22b6363 Merge pull request #1873 from stefan-it/distilbert-german
German DistilBERT
2019-11-29 09:25:47 +01:00
1e9ac5a7cf New -> normal 2019-11-28 17:43:47 -05:00
0b84b9fd8a Add processors to __init__ 2019-11-28 17:38:52 -05:00
f671997ef7 Interface with TFDS 2019-11-28 17:17:20 -05:00
bd41e8292a Cleanup & Evaluation now works 2019-11-28 16:03:56 -05:00
d49c43ff78 Merge pull request #1778 from eukaryote31/patch-2
from_pretrained: convert DialoGPT format
2019-11-28 16:08:37 +01:00
91caf2462c Merge pull request #1770 from huggingface/initi-encoder-mask
Only init encoder_attention_mask if stack is decoder
2019-11-28 16:06:55 +01:00
49a69d5b78 Merge pull request #1753 from digantamisra98/patch-1
Added Mish Activation Function
2019-11-28 15:24:08 +01:00
96e7ee7238 Merge pull request #1740 from huggingface/fix-ctrl-past
Fix CTRL past
2019-11-27 23:28:30 +01:00
8da47b078d fix merge tests 2019-11-27 23:11:37 +01:00
8c276b9c92 Merge branch 'master' into distilbert-german 2019-11-27 18:11:49 +01:00
3c28a2daac add add_special_tokens=True for input examples 2019-11-27 12:05:23 -05:00
a36f981d1b Merge branch 'master' into fix-ctrl-past 2019-11-27 17:25:46 +01:00
5afca00b47 Merge pull request #1724 from huggingface/fix_encode_plus
Fix encode_plus
2019-11-27 17:14:49 +01:00
49108288ba Merge pull request #1624 from Huawei-MRC-OSI/resumable_http
Add support for resumable downloads for HTTP protocol.
2019-11-27 17:11:07 +01:00
5340d1f21f Merge branch 'master' into resumable_http 2019-11-27 17:10:36 +01:00
10bd1ddb39 soft launch distilbert multilingual 2019-11-27 11:07:22 -05:00
d5478b939d add distilbert + update run_xnli wrt run_glue 2019-11-27 11:07:22 -05:00
07ab8d7af6 fix bug 2019-11-27 11:07:22 -05:00
d474022639 cleaning simple_accuracy since not used anymore 2019-11-27 11:07:22 -05:00
bcd8dc6b48 move xnli_compute_metrics to data/metrics 2019-11-27 11:07:22 -05:00
73fe2e7385 remove fstrings 2019-11-27 11:07:22 -05:00
3e7656f7ac update readme 2019-11-27 11:07:22 -05:00
abd397e954 uniformize w/ the cache_dir update 2019-11-27 11:07:22 -05:00
d75d49a51d add XnliProcessor to doc 2019-11-27 11:07:22 -05:00
d5910b312f move xnli processor (and utils) to transformers/data/processors 2019-11-27 11:07:22 -05:00
289cf4d2b7 change default for XNLI: dev --> test 2019-11-27 11:07:22 -05:00
cb7b77a8a2 fix some typos 2019-11-27 11:07:22 -05:00
84a0b522cf mbert reproducibility results 2019-11-27 11:07:22 -05:00
c4336ecbbd xnli - output_mode consistency 2019-11-27 11:07:22 -05:00
d52e98ff9a add xnli examples/README.md 2019-11-27 11:07:22 -05:00
71f71ddb3e run_xnli + utils_xnli 2019-11-27 11:07:22 -05:00
b5d884d25c Uniformize #1952 2019-11-27 11:05:55 -05:00
7fd1d42a01 Merge pull request #1592 from watkinsm/do_lower_case
Consider do_lower_case in PreTrainedTokenizer
2019-11-27 17:05:18 +01:00
21637d4924 Merge branch 'master' into do_lower_case 2019-11-27 17:04:39 +01:00
de2696f68e suggest to track repo w/ https rather than ssh 2019-11-27 11:02:28 -05:00
88b317739f Fix issue: #1962, input's shape seem to cause error in 2.2.0 version tf_albert_model 2019-11-27 10:38:10 -05:00
45d767297a Updated v2.2.0 doc 2019-11-27 10:12:20 -05:00
361620954a Remove TFBertForPreTraining from ALBERT doc 2019-11-27 10:11:37 -05:00
cc7968227e Updated v2.2.0 doc 2019-11-26 15:52:25 -05:00
ce02550d50 Fix pretrained models table 2019-11-26 15:47:02 -05:00
cf26a0c85e Fix pretrained models table 2019-11-26 15:40:03 -05:00
44b82c777f Updated v2.2.0 doc 2019-11-26 15:15:11 -05:00
ee4647bd5c CamemBERT & ALBERT doc 2019-11-26 15:10:51 -05:00
7c6000e412 Updated v2.2.0 doc 2019-11-26 14:55:29 -05:00
668aac45d2 Pretrained models 2019-11-26 14:52:42 -05:00
8742baa531 Improve test protocol for inputs_embeds in TF 2019-11-26 14:39:47 -05:00
cf62bdc962 Improve test protocol for inputs_embeds in TF
cc @lysandrejik
2019-11-26 14:37:32 -05:00
b632145273 Update master documentation link in README 2019-11-26 14:27:15 -05:00
ae98d45991 Release: v2.2.0 2019-11-26 14:12:44 -05:00
f2f329408d Fix input embeddings 2019-11-26 13:08:12 -05:00
bdfe21ab24 Change param order for consistency 2019-11-26 13:08:12 -05:00
c536c2a480 ALBERT Input Embeds 2019-11-26 13:08:12 -05:00
f873b55e43 Warning for ALBERT-v2 models 2019-11-26 13:08:12 -05:00
c9cb7f8a0f Torch 1.1.0 compatibility + FP16 O1 + TF checkpoints
Co-authored-by: wassname
2019-11-26 13:08:12 -05:00
b18509c208 Tests for ALBERT in TF2 + fixes 2019-11-26 13:08:12 -05:00
7bddbf5961 TFAlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
f6f382532b ALBERT in TF2 2019-11-26 13:08:12 -05:00
d9daad98c7 Re-ordering of group_idx/layer_idx + Python 2 tests 2019-11-26 13:08:12 -05:00
9d5c49546f Tests for AlbertForQuestionAnswering AlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
16263f9685 Headmasking 2019-11-26 13:08:12 -05:00
abb23a78ba Head pruning for ALBERT 2019-11-26 13:08:12 -05:00
4374eaea78 ALBERT for SQuAD 2019-11-26 13:08:12 -05:00
70d99980de ALBERT-V2 2019-11-26 13:08:12 -05:00
c110c41fdb Run GLUE and remove LAMB 2019-11-26 13:08:12 -05:00
6637a77f80 AlbertForSequenceClassification 2019-11-26 13:08:12 -05:00
0d07a23c04 LAMB implementation 2019-11-26 13:08:12 -05:00
c987545592 Converting script 2019-11-26 13:08:12 -05:00
4f3a54bfc8 ALBERT can load pre-trained models. Doesn't inherit from BERT anymore. 2019-11-26 13:08:12 -05:00
c4403006b8 External MLM head 2019-11-26 13:08:12 -05:00
b21402fc86 Python 2 tests + licence 2019-11-26 13:08:12 -05:00
c14a22272f ALBERT passes all tests 2019-11-26 13:08:12 -05:00
870320a24e Early tests 2019-11-26 13:08:12 -05:00
25a31953e8 Output Attentions + output hidden states 2019-11-26 13:08:12 -05:00
ce9eade29c Initializer range using BertPreTrainedModel 2019-11-26 13:08:12 -05:00
5680a11063 Activation function managed from the config file 2019-11-26 13:08:12 -05:00
1e5b31c388 Several fixes and improvements 2019-11-26 13:08:12 -05:00
ee20201d33 Tokenization tests + fixes + init 2019-11-26 13:08:12 -05:00
e3ea5d1d8d Docstrings 2019-11-26 13:08:12 -05:00
fedac786d4 Tokenization + small fixes 2019-11-26 13:08:12 -05:00
67b422662c Documentation + improved AlbertForMaskedLM 2019-11-26 13:08:12 -05:00
1b92564330 Reorganize and cleanup 2019-11-26 13:08:12 -05:00
12290c0d5c Handles multi layer and multi groups 2019-11-26 13:08:12 -05:00
139affaa8d Albert layer/layer groups 2019-11-26 13:08:12 -05:00
91ccbae788 Accepts multiple sizes 2019-11-26 13:08:12 -05:00
c0c2088333 ALBERT model 2019-11-26 13:08:12 -05:00
8e5d84fcc1 Fixed typo 2019-11-26 09:01:32 -05:00
0669c1fcd1 SQuAD v2 BERT + XLNet 2019-11-25 19:22:21 -05:00
5d3b8daad2 Minor bug fixes on run_ner.py 2019-11-25 16:48:03 -05:00
aa92a184d2 resize model when special tokenizer present 2019-11-25 15:06:32 -05:00
07bf43074f Fix GPT2 docstring 2019-11-25 11:32:00 -05:00
fa963ecc59 if→elif 2019-11-25 10:21:03 -05:00
c8eb8157b8 fix docstrings 2019-11-25 10:21:03 -05:00
99f750d64e add Camembert models to modeling_auto 2019-11-25 10:21:03 -05:00
7485caefb0 fix #1894 2019-11-25 09:33:39 -05:00
afaa335851 [doc] Fix assets urls 2019-11-23 11:34:45 -05:00
176cd1ce1b [doc] homogenize instructions slightly 2019-11-23 11:18:54 -05:00
041a901f32 Fix typo in documentation. toto -> to 2019-11-23 10:55:16 -05:00
e0e55bc550 Manage training example & refactor the refactor 2019-11-22 16:27:45 -05:00
c3ba645237 Works for XLNet 2019-11-22 16:27:37 -05:00
a5a8a6175f Works for BERT 2019-11-22 16:27:31 -05:00
a7dafe2f41 Padding strategy (left and right) rather than boolean flag 2019-11-22 16:27:25 -05:00
9f374c8252 encode and encode_plus handle attention masks and padding 2019-11-22 16:27:15 -05:00
72e506b22e wip 2019-11-22 16:26:00 -05:00
ea52f82455 Moved some SQuAD logic to /data 2019-11-22 16:25:52 -05:00
26db31e0c0 update the documentation 2019-11-21 14:41:19 -05:00
6f70bb8c69 add instructions to run the examples 2019-11-21 14:41:19 -05:00
05d4232f63 Add valohai.yaml 2019-11-21 12:38:17 +02:00
aac3551407 Add download_glue_data.py from kamalkraj/ALBERT-TF2.0
Original source: fa90194e5f/download_glue_data.py
Original license: fa90194e5f/LICENSE (Apache-2.0)
2019-11-21 12:37:41 +02:00
2cf3447e0a Glue: log in Valohai-compatible JSON format too 2019-11-21 12:35:25 +02:00
0cdfcca24b Merge pull request #1860 from stefan-it/camembert-for-token-classification
[WIP] Add support for CamembertForTokenClassification
2019-11-21 10:56:07 +01:00
e70cdf083d Cleanup TPU bits from run_glue.py
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.

We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
2019-11-20 17:54:34 -05:00
454455c695 fix #1879 2019-11-20 09:42:48 -05:00
3de31f8d28 mean does not exist in TF2 2019-11-19 18:14:14 -05:00
da06afafc8 tree-wide: add trailing comma in configuration maps 2019-11-19 21:57:00 +01:00
2e2c0375c3 distilbert: add German distilbert model to positional embedding sizes map 2019-11-19 20:41:18 +01:00
e7cf2ccd15 distillation: add German distilbert model 2019-11-19 19:55:19 +01:00
e631383d4f docs: add new German distilbert model to pretrained models 2019-11-19 19:52:40 +01:00
f21dfe36ba distilbert: add vocab for new German distilbert model 2019-11-19 19:51:31 +01:00
22333945fb distilbert: add pytorch model for new German distilbert model 2019-11-19 19:51:01 +01:00
337802783f distilbert: add configuration for new German distilbert model 2019-11-19 19:50:32 +01:00
4193aa9f81 add TFXLNetForTokenClassification implementation and unit test
add XLNetForTokenClassification implementation and unit tests
2019-11-19 12:47:54 +01:00
f3386d9383 typo "deay" -> "decay" 2019-11-18 11:50:06 -05:00
56c84863a1 camembert: add support for CamemBERT in run_ner example 2019-11-18 17:06:57 +01:00
0b3d45eb64 camembert: add implementation for save_vocabulary method 2019-11-18 15:49:44 +01:00
3916b334a8 [camembert] Acknowledge the full author list 2019-11-18 09:29:11 -05:00
44455eb5b6 Adds CamemBERT to Model architectures list 2019-11-18 09:23:14 -05:00
33753d9139 module: import CamembertForTokenClassification 2019-11-18 14:14:54 +01:00
d32ce2c8df camembert: add wrapper for CamembertForTokenClassification 2019-11-18 14:14:19 +01:00
d08a338c3b modified: transformers/modeling_utils.py 2019-11-16 18:47:37 +09:00
0477b307c7 [camembert] tokenizer: use additional_special_tokens 2019-11-16 00:11:07 -05:00
f9abf73e31 [camembert] realign w/ recent changes 2019-11-16 00:11:07 -05:00
26858f27cb [camembert] Upload to s3 + rename script 2019-11-16 00:11:07 -05:00
035fea5315 Add CamemBERT to auto files and docs 2019-11-16 00:11:07 -05:00
694d4fcbb6 Add CamemBERT classes to __init__.py 2019-11-16 00:11:07 -05:00
3e20c2e871 Update demo_camembert.py with new classes 2019-11-16 00:11:07 -05:00
f12e4d8da7 Move demo_camembert.py to examples/contrib 2019-11-16 00:11:07 -05:00
fb6c70a91d Update tokenization_camembert.py with urls 2019-11-16 00:11:07 -05:00
e44b939e71 Add configuration_camembert.py and modeling_camembert.py 2019-11-16 00:11:07 -05:00
6e72fd094c Add demo_camembert.py 2019-11-16 00:11:07 -05:00
14b3aa3b3c Add tokenization_camembert.py 2019-11-16 00:11:07 -05:00
ca99a2d500 Update example readme 2019-11-15 14:55:26 +08:00
7da3ef24cd add is_impossible tensor to model inputs during fine-tuning xlnet on squad2.0 2019-11-15 14:18:53 +08:00
74ce8de7d8 Merge pull request #1792 from stefan-it/distilbert-for-token-classification
DistilBERT for token classification
2019-11-14 22:47:53 +01:00
05db5bc1af added small comparison between BERT, RoBERTa and DistilBERT 2019-11-14 22:40:22 +01:00
9629e2c676 Merge pull request #1804 from ronakice/master
fix multi-gpu eval in torch examples
2019-11-14 22:24:05 +01:00
5b322a36db Merge pull request #1811 from huggingface/special-tokens
Fix special tokens addition in decoder #1807
2019-11-14 22:17:24 +01:00
1a237d7f42 Merge pull request #1831 from iedmrc/gpt2-tokenization-sum-func-replacement
sum() is replaced by itertools.chain.from_iterable()
2019-11-14 22:11:54 +01:00
df99f8c5a1 Merge pull request #1832 from huggingface/memory-leak-schedulers
replace LambdaLR scheduler wrappers by function
2019-11-14 22:10:31 +01:00
0be9ae7b3e Merge pull request #1833 from huggingface/max-length-warning
Token indices sequence length is longer than the specified maximum sequence length for this model
2019-11-14 22:04:49 +01:00
be7f2aacce [CI][DOC] Don't rebuild if folder exists - Correct directory. 2019-11-14 14:54:44 -05:00
8f8d69716a [CI][DOC] Don't rebuild if folder exists. 2019-11-14 14:48:21 -05:00
2276bf69b7 update the examples, docs and template 2019-11-14 20:38:02 +01:00
d7929899da Specify checkpoint in saved file for run_lm_finetuning.py 2019-11-14 10:49:00 -05:00
a67e747889 Reorganized max_len warning 2019-11-14 10:30:22 -05:00
e18f786cd5 Quickstart example showcasing past 2019-11-14 10:06:00 -05:00
022525b003 replace LambdaLR scheduler wrappers by function
Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR
class and passing a method of the wrapping class to the __init__
function of LambdaLR. This approach is not appropriate for several
reasons:

1. one does not need to define a class when it only defines a
__init__() method;
2. instantiating the parent class by passing a method of the child class
creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134.

In this commit we replace the wrapper classes with functions that
instantiate `LambdaLR` with a custom learning rate function. We use a
closure to specify the parameter of the latter. We also do a bit of
renaming within the function to explicit the behaviour and removed
docstrings that were subsequently not necessary.
2019-11-14 15:39:08 +01:00
7627dde1f8 sum() is the leanest method to flatten a string list, so it's been replaced by itertools.chain.from_iterable() 2019-11-14 17:06:15 +03:00
74d0bcb6ff Fix special tokens addition in decoder 2019-11-12 15:27:57 -05:00
155c782a2c [inputs_embeds] All TF models + tests 2019-11-12 11:29:21 -05:00
2aef2f0bbc [common attributes] Fix previous commit for transfo-xl 2019-11-12 11:29:21 -05:00
2f17464266 [common attributes] Slightly sharper test coverage 2019-11-12 11:29:21 -05:00
9d2398fd99 Ooopsie 2019-11-12 11:29:21 -05:00
70d97ddd60 [TF models] Common attributes as per #1721 2019-11-12 11:29:21 -05:00
872403be1c This is not a @property after all 2019-11-12 11:29:21 -05:00
dd6b2e05e1 whitespace 2019-11-12 11:29:21 -05:00
d409aca326 Clarify the use of past in GPT2 and CTRL 2019-11-12 10:59:37 -05:00
7246d3c2f9 Consider do_lower_case in PreTrainedTokenizer
As pointed out in #1545, when using an uncased model, and adding
a new uncased token, the tokenizer does not correctly identify this
in the case that the input text contains the token in a cased format.

For instance, if we load bert-base-uncased into BertTokenizer, and
then use .add_tokens() to add "cool-token", we get the expected
result for .tokenize('this is a cool-token'). However, we get a
possibly unexpected result for .tokenize('this is a cOOl-Token'),
which in fact mirrors the result for the former from before the new
token was added.

This commit adds
- functionality to PreTrainedTokenizer to handle this
situation in case a tokenizer (currently Bert, DistilBert,
and XLNet) has the do_lower_case=True kwarg by:
    1) lowercasing tokens added with .add_tokens()
    2) lowercasing text at the beginning of .tokenize()
- new common test case for tokenizers

https://github.com/huggingface/transformers/issues/1545
2019-11-12 13:08:30 +02:00
2e31176557 fix multi-gpu eval 2019-11-12 05:55:11 -05:00
8aba81a0b6 fix #1789 2019-11-12 08:52:43 +01:00
94e55253ae tests: add test case for DistilBertForTokenClassification implementation 2019-11-11 16:20:15 +01:00
2b07b9e5ee examples: add DistilBert support for NER fine-tuning 2019-11-11 16:19:34 +01:00
1806eabf59 module: add DistilBertForTokenClassification import 2019-11-11 16:18:48 +01:00
1c7253cc5f modeling: add DistilBertForTokenClassification implementation 2019-11-11 16:18:16 +01:00
b5d330d118 Fix #1784 2019-11-11 10:15:14 -05:00
90f6e73a35 Add DialoGPT support for Pytorch->TF 2019-11-09 16:46:19 +00:00
ef99852961 from_pretrained: convert DialoGPT format
DialoGPT checkpoints have "lm_head.decoder.weight" instead of "lm_head.weight". 

(see: https://www.reddit.com/r/MachineLearning/comments/dt5woy/p_dialogpt_state_of_the_art_conversational_model/f6vmwuy?utm_source=share&utm_medium=web2x)
2019-11-09 16:32:40 +00:00
7a9aae1044 Fix run_bertology.py
Make imports and args.overwrite_cache match run_glue.py
2019-11-08 16:28:40 -05:00
268d4f2099 fix position biases + better tests 2019-11-08 16:41:55 +01:00
b4fcd59a5a add sentinels in tokenizer 2019-11-08 14:38:53 +01:00
15e53c4e87 maybe fix tests 2019-11-08 12:43:21 +01:00
f03c0c1423 adding models in readme and auto classes 2019-11-08 11:49:46 +01:00
4321c54125 fix tests 2019-11-08 11:49:32 +01:00
727a79b305 added TF2 model and tests - updated templates 2019-11-08 11:35:03 +01:00
cd286c2145 add condition around mask transformation 2019-11-08 11:31:16 +01:00
28d0ba35d7 only init encoder_attention_mask if stack is decoder
We currently initialize `encoder_attention_mask` when it is `None`,
whether the stack is that of an encoder or a decoder. Since this
may lead to bugs that are difficult to tracks down, I added a condition
that assesses whether the current stack is a decoder.
2019-11-08 11:22:19 +01:00
8fda532c3c fix python 2 sentencepiece tokenization 2019-11-07 17:09:50 +01:00
ba10065c4b update model, conversion script, tests and template 2019-11-07 15:55:36 +01:00
070dcf1c02 Added Mish Activation Function
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681
It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev. 
All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish
Might be a good addition to experiment with especially in the Bert Model.
2019-11-07 03:45:43 +05:30
1c542df7e5 Add RoBERTa-based GPT-2 Output Detector from OpenAI
converted from https://github.com/openai/gpt-2-output-dataset/tree/master/detector

Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-Authored-By: Jong Wook Kim <jongwook@nyu.edu>
Co-Authored-By: Jeff Wu <wuthefwasthat@gmail.com>
2019-11-06 16:26:31 -05:00
2f3a421018 Fix other PyTorch models 2019-11-06 14:03:47 -05:00
d5319793c4 Fix BERT 2019-11-06 14:03:47 -05:00
27e015bd54 [tests] Flag to test on cuda 2019-11-06 14:03:47 -05:00
13d9135fa5 [tests] get rid of warning
cf. https://docs.pytest.org/en/latest/example/simple.html
2019-11-06 14:03:47 -05:00
076a207935 adding tests and updating model 2019-11-06 11:52:50 +01:00
73f2c342f5 fixing template 2019-11-06 11:52:39 +01:00
3835e1e651 adding tokenizer 2019-11-06 11:52:29 +01:00
f88c104d8f [run_tf_glue] Add comment for context 2019-11-05 19:56:43 -05:00
30968d70af misc doc 2019-11-05 19:06:12 -05:00
de890ae67d Updating docblocks in optimizers.py 2019-11-05 17:31:29 -05:00
d7d36181fd GPT-2 XL 2019-11-05 13:31:58 -05:00
151e4ab4e7 Fix CTRL past 2019-11-05 16:26:51 +00:00
88e5bef58f share position biases 2019-11-05 17:02:52 +01:00
568c0ffb7e adding T5 model 2019-11-05 16:40:29 +01:00
7daacf00df Merge pull request #1695 from huggingface/models_inputs_embeds
model forwards can take an inputs_embeds param
2019-11-05 09:55:28 -05:00
a44f112fb9 add authors for models 2019-11-05 08:48:26 -05:00
60a5babd57 adding files 2019-11-05 12:01:23 +01:00
124409d075 Make dummy inputs a property of TFPreTrainedModel. 2019-11-05 11:48:45 +01:00
e99071f105 Merge pull request #1734 from orena1/patch-1
add progress bar to convert_examples_to_features
2019-11-05 11:34:20 +01:00
dfb61caf77 fix #1692 2019-11-05 11:25:13 +01:00
ba973342e3 Merge pull request #1553 from WilliamTambellini/timeSquadInference
Add speed log to examples/run_squad.py
2019-11-05 11:13:12 +01:00
8df7dfd2a7 Make dummy inputs a local variable in TFPreTrainedModel. 2019-11-05 11:09:16 +01:00
237fad339c Merge pull request #1709 from oneraghavan/master
Fixing mode in evaluate during training
2019-11-05 10:55:33 +01:00
f1e4db2aa8 Fix #1686 2019-11-05 09:38:00 +01:00
d7906165a3 add progress bar for convert_examples_to_features
It takes considerate amount of time (~10 min) to parse the examples to features, it is good to have a progress-bar to track this
2019-11-05 10:34:27 +02:00
d2e2577dd3 Merge pull request #1723 from huggingface/fix-1623
Fix #1623
2019-11-05 08:36:30 +01:00
00337e9687 [inputs_embeds] All PyTorch models 2019-11-05 00:39:18 +00:00
9eddf44b7a docstring + check 2019-11-04 17:19:15 +00:00
8e11de0e86 model forwards can take an inputs_embeds param 2019-11-04 16:56:26 +00:00
68f7064a3e Add model.train() line to ReadMe training example
Co-Authored-By: Santosh-Gupta <San.Gupta.ML@gmail.com>
2019-11-04 11:52:35 -05:00
8d6b9d717c fix #1532 and encode_plus 2019-11-04 17:07:51 +01:00
c8f2712199 Merge pull request #1721 from huggingface/common_attributes
Add common getter and setter for input_embeddings & output_embeddings
2019-11-04 16:21:52 +01:00
89d6272898 Fix #1623 2019-11-04 16:21:12 +01:00
b340a910ed fix tests - flagged as slow all the tests downloading from AWS 2019-11-04 16:03:36 +01:00
f02805da6f fix tests 2019-11-04 15:42:23 +01:00
1d4d070256 Merge pull request #1549 from hlums/master
Fix token order in xlnet preprocessing for SQuAD
2019-11-04 15:37:15 +01:00
1724cee8c4 switch from properties to methods 2019-11-04 15:34:10 +01:00
9b45d0f878 Add common properties input_embeddings and output_embeddings 2019-11-04 12:28:56 +01:00
9a3b173cd3 Merge branch 'master' into master 2019-11-04 11:41:26 +01:00
ad90868627 Update example readme 2019-11-04 11:27:22 +01:00
e5b1048bae Fixing mode in evaluate during training 2019-11-03 16:14:46 +05:30
8a62835577 Merge pull request #1679 from cregouby/master
Fix https://github.com/huggingface/transformers/issues/1673
2019-11-01 22:02:24 +01:00
93d2fff071 Close #1654 2019-11-01 09:47:38 -04:00
1a2b40cb53 run_tf_glue MRPC evaluation only for MRPC 2019-10-31 18:00:51 -04:00
be36cf92fb Added mixed precision support to benchmarks.py 2019-10-31 17:24:37 -04:00
2a5663c280 Merge branch 'mataney-fix_top_k_top_p_filtering' 2019-10-31 18:28:34 +00:00
f96ce1c241 [run_generation] Fix generation with batch_size>1 2019-10-31 18:27:11 +00:00
3c1b6f594e Merge branch 'master' into fix_top_k_top_p_filtering 2019-10-31 13:53:51 -04:00
0e4cc050d6 Add support for resumable downloads for HTTP protocol. 2019-10-31 18:25:34 +03:00
ac29353abe Fix https://github.com/huggingface/transformers/issues/1673 2019-10-31 10:04:40 +01:00
fa735208c9 update readme - fix example command distil* 2019-10-30 14:27:28 -04:00
c7058d8224 Merge pull request #1608 from focox/master
Error raised by "tmp_eval_loss += tmp_eval_loss.item()" when using multi-gpu
2019-10-30 17:14:07 +01:00
22838f19fd Merge pull request #1668 from tlkh/fix-tf-xlm
Fixed training for TF XLM
2019-10-30 17:08:00 +01:00
7f84fc571a Merge pull request #1670 from huggingface/templates
Templates and explanation for adding a new model and example script
2019-10-30 17:05:58 +01:00
04c69db399 Merge pull request #1628 from huggingface/tfglue
run_tf_glue works with all tasks
2019-10-30 17:04:03 +01:00
5c6a19a94a Merge pull request #1604 from huggingface/deploy_doc
Versioning in documentation
2019-10-30 17:03:14 +01:00
3df4367244 Merge pull request #1601 from huggingface/clean-roberta
Clean roberta model & all tokenizers now add special tokens by default (breaking change)
2019-10-30 17:00:40 +01:00
6d73c92cae Merge pull request #1455 from huggingface/conditional-generation
[WIP] Sequence generation using pretrained BERT
2019-10-30 16:54:18 +01:00
36174696cc Merge branch 'master' into clean-roberta 2019-10-30 16:51:06 +01:00
228cdd6a6e Merge branch 'master' into conditional-generation 2019-10-30 16:40:35 +01:00
3cf2020c6b change kwargs processing 2019-10-30 16:27:51 +01:00
a88a0e4413 add tests to encoder-decoder model 2019-10-30 16:06:29 +01:00
3f07cd419c update test on Bert to include decoder mode 2019-10-30 15:09:53 +01:00
55fbfea369 Update CONTRIBUTING.md
Co-Authored-By: Stefan Schweter <stefan.schweter@bsb-muenchen.de>
2019-10-30 12:25:40 +01:00
cef2a8f900 Update CONTRIBUTING.md
Co-Authored-By: Stefan Schweter <stefan.schweter@bsb-muenchen.de>
2019-10-30 12:25:31 +01:00
328a86d2af adding links to the templates in readme and contributing 2019-10-30 11:37:55 +01:00
7f4226f9e6 adding templates 2019-10-30 11:31:56 +01:00
070507df1f format utils for summarization 2019-10-30 11:24:12 +01:00
da10de8466 fix bug with padding mask + add corresponding test 2019-10-30 11:19:58 +01:00
3b0d2fa30e rename seq2seq to encoder_decoder 2019-10-30 10:54:46 +01:00
9c1bdb5b61 revert renaming of lm_labels to ltr_lm_labels 2019-10-30 10:43:13 +01:00
842f3bf049 Fixed training for TF XLM 2019-10-30 01:32:15 +00:00
098a89f312 update docstrings; rename lm_labels to more explicit ltr_lm_labels 2019-10-29 20:08:03 +01:00
dfce409691 resolve PR comments 2019-10-29 17:10:20 +01:00
079bfb32fb Evaluation fixed. 2019-10-28 10:18:58 -04:00
438f2730a0 Evaluation code fixed. 2019-10-28 10:18:58 -04:00
4c3ac4a7d8 here's one big commit 2019-10-28 10:49:50 +01:00
932543f77e fix test of truncation function 2019-10-28 10:49:49 +01:00
a67413ccc8 extend works in-place 2019-10-28 10:49:49 +01:00
cb26b035c6 remove potential UndefinedError 2019-10-28 10:49:49 +01:00
b915ba9dfe pad sequence with 0, mask with -1 2019-10-28 10:49:49 +01:00
dc580dd4c7 add lm_labels for the LM cross-entropy 2019-10-28 10:49:49 +01:00
f873a3edb2 the decoder attends to the output of the encoder stack (last layer) 2019-10-28 10:49:00 +01:00
d36680df54 Rever changes to TF distilbert due to failed test: TFDistilBertModelTest.test_pt_tf_model_equivalence 2019-10-27 14:51:36 +08:00
ec276d6aba Add special tokens to documentation for the tensorflow model examples #1561 2019-10-27 14:00:40 +08:00
6e011690a9 Add special tokens to documentation for the rest of pytorch model examples #1561 2019-10-27 13:59:14 +08:00
beaf66b1f3 Remove break 2019-10-24 21:43:28 +00:00
bab6ad01aa run_tf_glue works with all tasks 2019-10-24 21:41:45 +00:00
ae1d03fc51 Add roberta to doc 2019-10-24 14:32:48 -04:00
4e5f88b74f Add Roberta to run_ner.py 2019-10-24 14:32:48 -04:00
b92d68421d Use roberta model and update doc strings 2019-10-24 14:32:48 -04:00
66085a1321 RoBERTa token classification
[WIP] copy paste bert token classification for roberta
2019-10-24 14:32:48 -04:00
b82bfbd0c3 Updated README to show all available documentation 2019-10-24 15:55:31 +00:00
5b6cafb11b [release] fix table weirdness 2019-10-23 10:35:16 -04:00
8ad5c591cd [RELEASE] DistilRoBERTa 2019-10-23 10:29:47 -04:00
bd847ce7d7 fixed the bug raised by "tmp_eval_loss += tmp_eval_loss.item()" when parallelly using multi-gpu. 2019-10-23 20:27:13 +08:00
6e85bccafc Fixed typo 2019-10-22 18:07:01 -04:00
fbcc5ff9fb Change branch to master 2019-10-22 18:01:10 -04:00
69eba0ab19 Edit script path 2019-10-22 17:53:52 -04:00
bc3e57d551 Multi version doc deployment 2019-10-22 17:51:30 -04:00
ef1b8b2ae5 [CTRL] warn if generation prompt does not start with a control code
see also https://github.com/salesforce/ctrl/pull/50
2019-10-22 21:30:32 +00:00
e16d46843a Fix architectures count 2019-10-22 15:13:47 -04:00
7d709e55ed Remove 2019-10-22 14:12:33 -04:00
44286b94d3 RoBERTa doesn't print a warning when no special tokens are passed. 2019-10-22 13:46:48 -04:00
1cfd974868 Option to benchmark only one of the two libraries 2019-10-22 13:32:23 -04:00
777faa8ae7 Fix #1597 2019-10-22 11:26:42 -04:00
b8c9ea0010 Merge pull request #1580 from pminervini/master
Gradient norm clipping should be done right before calling the optimiser
2019-10-22 13:59:20 +02:00
abd7110e21 gradient norm clipping should be done right before calling the optimiser - fixing run_glue and run_ner as well 2019-10-21 19:56:52 +01:00
4d456542e9 Fix citation 2019-10-21 16:34:14 +02:00
0e64fec1ab Merge pull request #1568 from daemon/patch-1
Fix hanging when loading pretrained models
2019-10-21 14:31:57 +02:00
3a52b65795 Add special tokens to documentation for bert examples to resolve issue: #1561 2019-10-21 12:55:51 +08:00
86a630702d Merge branch 'huggingface/master' 2019-10-21 12:06:09 +08:00
3775550c4b gradient norm clipping should be done right before calling the optimiser 2019-10-20 22:33:56 +01:00
bf2c36a920 Merge pull request #1 from huggingface/master
update
2019-10-20 23:30:45 +02:00
a2c8c8ef00 Fix hanging when loading pretrained models
- Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled.
2019-10-19 16:19:20 -04:00
82f6abd98a Benchmark section added to the documentation 2019-10-18 17:27:10 -04:00
7dd29ed2f1 Benchmarks example script 2019-10-18 10:53:04 -04:00
8efc0ec91a Add Benchmarks to issue templates 2019-10-18 10:45:44 -04:00
0919389d9a Add speed log to examples/run_squad.py
Add a speed estimate log (time per example)
for evaluation to examples/run_squad.py
2019-10-17 14:41:04 -07:00
fd97761c5a soft launch distilroberta 2019-10-17 15:28:58 -04:00
ecd15667f3 fix repetition penalty 2019-10-17 14:47:14 -04:00
56e2ee4ead fix model2model 2019-10-17 16:33:31 +02:00
8cd56e3036 fix data processing in script 2019-10-17 16:33:26 +02:00
578d23e061 add training pipeline (formatting temporary) 2019-10-17 14:02:27 +02:00
47a06d88a0 use two different tokenizers for storyand summary 2019-10-17 13:04:26 +02:00
bfb9b540d4 add Model2Model to __init__ 2019-10-17 12:59:51 +02:00
c1bc709c35 correct the truncation and padding of dataset 2019-10-17 10:41:53 +02:00
87d60b6e19 reword explanation of encoder_attention_mask 2019-10-17 10:18:19 +02:00
638fe7f5a4 correct composition of padding and causal masks 2019-10-17 10:13:07 +02:00
4e0f24348f document the MLM modification + raise exception on MLM training with encoder-decoder 2019-10-17 09:41:53 +02:00
624a5644cc revert black formatting to conform with lib style 2019-10-17 09:27:56 +02:00
9b71fc9a18 tying weights is going to be a clusterfuck 2019-10-16 21:31:38 +02:00
95ec1d08be separate inputs into encoder & decoder inputs 2019-10-16 20:55:42 +02:00
e4e0ee14bd add separator between data import and train 2019-10-16 20:05:32 +02:00
a424892fab correct syntax error: dim() and not dims() 2019-10-16 18:24:32 +02:00
33c01368b1 remove Bert2Rnd test 2019-10-16 18:13:05 +02:00
c544194611 Remove special_tokens_mask from inputs in README
Co-authored-by: Thomas Wolf @thomwolf
2019-10-16 11:05:13 -04:00
0752069617 adapt attention masks for the decoder case
The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.
2019-10-16 16:12:22 +02:00
c5a94a6100 fix function that defines masks in XLM
the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.
2019-10-16 13:00:32 +02:00
488a664151 add is_decoder attribute to PretrainedConfig
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.

In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
2019-10-15 21:03:32 +02:00
4c81960b9b comment the seq2seq functions 2019-10-15 20:52:28 +02:00
6d6c326737 take path to pretrained for encoder and decoder for init 2019-10-15 16:08:27 +02:00
0d81fc853e specify in readme that both datasets are required 2019-10-15 15:26:33 +02:00
19e9964780 remove Bert2Bert from module declaration 2019-10-15 15:20:28 +02:00
1aec940587 test the full story processing 2019-10-15 15:18:07 +02:00
22e1af6859 truncation function is fully tested 2019-10-15 14:43:50 +02:00
260ac7d9a8 wip commit, switching computers 2019-10-15 12:24:35 +02:00
be916cb3fb Merge branch 'master' of https://github.com/huggingface/transformers 2019-10-15 10:37:13 +02:00
5875aaf762 install tensorboard 2019-10-15 10:36:46 +02:00
40f14ff545 Merge pull request #1513 from slayton58/amp_fp16_einsum
Force einsum to run in fp16
2019-10-15 10:25:00 +02:00
e703e4dfe1 Merge pull request #1509 from julian-pani/patch-3
remove leftover usage of DUMMY_INPUTS
2019-10-15 10:24:13 +02:00
898ce064f8 add tests on TF2.0 & PT checkpoint => model convertion functions 2019-10-15 10:04:19 +02:00
d147671c6c Merge pull request #1508 from tlkh/master
Added performance enhancements (XLA, AMP) to examples
2019-10-15 09:57:18 +02:00
2c1d5564ad add readme information 2019-10-15 09:56:52 +02:00
08bd8f9f39 Merge pull request #1505 from e-budur/master
Fixed the sample code in the title 'Quick tour'.
2019-10-15 09:50:36 +02:00
8aa3b753bd Merge pull request #1434 from bryant1410/patch-1
Remove unnecessary use of FusedLayerNorm in XLNet
2019-10-15 09:44:19 +02:00
621e7a2529 Merge pull request #1275 from stecklin/ner-fine-tuning
Implement fine-tuning BERT on CoNLL-2003 named entity recognition task
2019-10-15 09:35:24 +02:00
c55badcee0 Add NER finetuning details by @stefan-it in example readme 2019-10-15 09:33:52 +02:00
788e632622 [ner] Honor args.overwrite_cache 2019-10-15 09:17:31 +02:00
0f9ebb0b43 add seqeval as requirement for examples 2019-10-15 09:17:31 +02:00
66adb71734 update to transformers 2019-10-15 09:17:31 +02:00
5ff9cd158a Add option to predict on test set 2019-10-15 09:17:31 +02:00
7f5367e0b1 Add cli argument for configuring labels 2019-10-15 09:17:31 +02:00
e1d4179b64 Make file reading more robust 2019-10-15 09:17:31 +02:00
383ef96747 Implement fine-tuning BERT on CoNLL-2003 named entity recognition task 2019-10-15 09:17:31 +02:00
5adb39e757 Add option to predict on test set 2019-10-15 09:14:53 +02:00
99b189df6d Add cli argument for configuring labels 2019-10-15 09:14:53 +02:00
3e9420add1 Make file reading more robust 2019-10-15 09:14:53 +02:00
cde42c4354 Implement fine-tuning BERT on CoNLL-2003 named entity recognition task 2019-10-15 09:14:53 +02:00
74c5035808 Fix token order in xlnet preprocessing. 2019-10-14 21:27:11 +00:00
fe25eefc15 add instructions to fetch the dataset 2019-10-14 20:45:39 +02:00
412793275d delegate the padding with special tokens to the tokenizer 2019-10-14 20:45:16 +02:00
447fffb21f process the raw CNN/Daily Mail dataset
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with  all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
2019-10-14 18:12:20 +02:00
80889a0226 Merge pull request #1512 from louismartin/fix-roberta-convert
Fix import error in script to convert faisreq roberta checkpoints
2019-10-14 17:40:32 +02:00
4e6a55751a Force einsum to fp16 2019-10-14 11:12:41 -04:00
f62f992cf7 Merge pull request #1502 from jeffxtang/master
the working example code to use BertForQuestionAnswering
2019-10-14 16:14:52 +02:00
67d10960ae load and prepare CNN/Daily Mail data
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
2019-10-14 14:11:20 +02:00
d9d387afce clean up 2019-10-14 12:14:40 +02:00
b7141a1bc6 maxi simplication 2019-10-14 12:14:08 +02:00
bfbe68f035 update forward pass 2019-10-14 12:04:23 +02:00
0ef9bc923a Cleaning up seq2seq [WIP] 2019-10-14 11:58:13 +02:00
49cba6e543 Fix import error in script to convert faisreq roberta checkpoints 2019-10-14 01:38:57 -07:00
0993586758 remove usage of DUMMY_INPUTS
Hey @thomwolf  
This change da26bae61b (diff-8ddce309e88e8eb5b4d02228fd8881daL28-L29) removed the constant, but one usage of that constant remains in the code.
2019-10-14 02:09:53 +03:00
376e65a674 Added automatic mixed precision and XLA options to run_tf_glue.py 2019-10-13 13:19:06 +00:00
86f23a1944 Minor enhancements to run_tf_glue.py 2019-10-13 10:21:35 +00:00
5a8c6e771a Fixed the sample code in the title 'Quick tour'. 2019-10-12 14:17:17 +03:00
e76d71521c the working example code to use BertForQuestionAnswering and get an answer from a text and a question 2019-10-11 17:04:02 -07:00
d844db4005 Add citation bibtex 2019-10-11 16:55:42 -04:00
a701c9b321 CTRL to tf automodels 2019-10-11 16:05:30 -04:00
b3261e7ace read parameters from CLI, load model & tokenizer 2019-10-11 18:40:38 +02:00
d889e0b71b add base for seq2seq finetuning 2019-10-11 17:36:12 +02:00
f8e98d6779 load pretrained embeddings in Bert decoder
In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.

We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.
2019-10-11 16:48:11 +02:00
3ddce1d74c Release: 2.1.1 2019-10-11 06:37:49 -04:00
4428aefc63 Merge pull request #1488 from huggingface/pytorch-tpu
GLUE on TPU
2019-10-11 16:33:00 +02:00
3b43b01872 Merge pull request #1482 from huggingface/tf2_integration_tests
Integration of TF 2.0 models with other Keras modules
2019-10-11 16:25:43 +02:00
4b8f3e8f32 adding citation 2019-10-11 16:18:16 +02:00
18a3cef7d5 no nans 2019-10-11 16:09:42 +02:00
1f5d9513d8 fix test 2019-10-11 15:55:01 +02:00
0f9fc4fbde adding option to desactivate past/memory outputs 2019-10-11 15:47:08 +02:00
700331b5ec Merge pull request #1492 from stefan-it/bert-german-dbmdz-models
Add new BERT models for German (cased and uncased)
2019-10-11 13:01:52 +02:00
573dde9b44 Merge pull request #1405 from slayton58/xlnet_layer_reorder
Re-order XLNet attention head outputs for better perf
2019-10-11 12:10:58 +02:00
5f25a5f367 model: add support for new German BERT models (cased and uncased) from @dbmdz 2019-10-11 10:20:33 +02:00
f382a8decd convert int to str before adding to a str 2019-10-10 19:20:39 -04:00
639f4b7190 Don't save/load when on TPU 2019-10-10 19:17:25 +00:00
d4e7934ac3 GLUE on TPU 2019-10-10 19:03:06 +00:00
1e68c28670 add test for initialization of Bert2Rnd 2019-10-10 18:07:11 +02:00
2a4fef837a move Circle-CI from TF2-rc0 to official TF2 2019-10-10 15:57:35 +02:00
751e246087 using tf.print in roberta 2019-10-10 15:47:20 +02:00
fa218e648a fix syntax errors 2019-10-10 15:16:07 +02:00
c9e8c51946 fixing SequenceSummary head in TF 2.0 2019-10-10 15:16:05 +02:00
da26bae61b adding more tests on TF and pytorch serialization - updating configuration for better serialization 2019-10-10 14:30:48 +02:00
3e1cd8241e fix stupid (re)naming issue 2019-10-10 14:18:20 +02:00
81ee29ee8d remove the staticmethod used to load the config 2019-10-10 14:13:37 +02:00
bb04edb45b Add tests that TF 2.0 model can be integrated with other Keras modules 2019-10-10 13:08:24 +02:00
d7092d592c rename the attributes in the Bert Layer
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
2019-10-10 12:51:14 +02:00
51261167b4 prune both attention and self-attention heads 2019-10-10 12:17:22 +02:00
17177e7379 add is_decoder as an attribute to Config class 2019-10-10 12:03:58 +02:00
6596e3d566 Merge pull request #1454 from bkkaggle/pytorch-built-in-tensorboard
Change tensorboard imports to use built-in tensorboard if available
2019-10-10 11:56:55 +02:00
4bc4601192 Merge pull request #1480 from huggingface/fix_ctrl_tokenizer
Fixing CTRL tokenizer - Update error messages - XLM-MLM in run_generation
2019-10-10 11:56:20 +02:00
177a721205 move back to simple space spliting 2019-10-10 11:45:47 +02:00
df85a0ff0b replace double quotes with simple quotes 2019-10-10 11:38:26 +02:00
9ca788b2e8 merge the two Bert layers classes 2019-10-10 11:33:28 +02:00
a5997dd81a better error messages 2019-10-10 11:31:01 +02:00
edfc8f8225 Remove and do the branching in 2019-10-10 10:17:27 +02:00
09cfd12235 remove and do the branching in 2019-10-10 10:15:27 +02:00
43a237f15e switching to moses tokenizer 2019-10-10 10:11:16 +02:00
877ef2c6ca override from_pretrained in Bert2Rnd
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
2019-10-10 10:02:18 +02:00
851ef592c5 add comment on recursive weights loading 2019-10-10 10:02:03 +02:00
036483fae5 Temporary CTRL tokenizer fix 2019-10-09 16:33:15 -04:00
9c2e0a4acf Release: 2.1.0 2019-10-09 12:14:03 -04:00
7fe98d8c18 Update CTRL documentation 2019-10-09 12:12:36 -04:00
89f86f9661 CTRL added to the documentation 2019-10-09 12:04:06 -04:00
e17ea08e24 Pycharm folder added to gitignore 2019-10-09 11:32:21 -04:00
2431fea98a Merge pull request #1383 from keskarnitish/master
Adding CTRL
2019-10-09 11:31:05 -04:00
d9e60f4f0d Merge branch 'master' into pr/1383 2019-10-09 17:25:08 +02:00
e84470ef81 Merge pull request #1384 from huggingface/encoding-qol
Quality of life enhancements in encoding + patch MLM masking
2019-10-09 11:18:24 -04:00
07d055f849 higher tolerance 2019-10-09 17:10:04 +02:00
48b438ff2a doc and conversion 2019-10-09 17:06:30 +02:00
69629c4f0f Improve naming and only do regex when necessary 2019-10-09 08:48:40 -04:00
bf34a252b8 Golden path 2019-10-09 08:48:40 -04:00
528d3f327b Improve readability and improve make less assumptions about checkpoint format 2019-10-09 08:48:40 -04:00
56301bd9e8 Extract method 2019-10-09 08:48:40 -04:00
d6c5469712 Delete older checkpoint after saving new checkpoint 2019-10-09 08:48:40 -04:00
54a31f50fb Add save_total_limit 2019-10-09 08:48:40 -04:00
c19b8e4ae0 fixing CTRL tests and OpenAI GPT tests 2019-10-09 13:51:05 +02:00
6dce6dda1b fixing TF 2.0 model - adding more severe test on pt/tf equivalence 2019-10-09 11:57:55 +02:00
c56d921dda adding TF 2.0 model 2019-10-09 11:07:43 +02:00
1c5079952f simpler distilbert mask - fix tf tests 2019-10-09 04:26:20 +02:00
58b302caf3 Merge pull request #1398 from dveselov/patch-1
Fixed typo in docs README
2019-10-09 03:52:42 +02:00
439fac723a Merge pull request #1409 from brian41005/master
Evaluation result.txt path changing #1286
2019-10-09 03:14:34 +02:00
23b7138ab4 fix #1378 and #1453 2019-10-09 01:54:44 +02:00
5ce8d29abe Change tensorboard imports to use built-in tensorboard if available 2019-10-08 16:29:43 -05:00
d688af19e5 Update link to swift-coreml-transformers
cc @lysandrejik
2019-10-08 16:37:52 -04:00
45dc04f33d tf model [WIP] 2019-10-08 17:37:17 +02:00
770b15b58c rename class in __init__ 2019-10-08 17:32:28 +02:00
248314772f fix tokenization 2019-10-08 17:19:28 +02:00
03c2c762a6 update tokenizer 2019-10-08 17:12:03 +02:00
3edfa1d6aa update model to use past 2019-10-08 17:11:58 +02:00
f4d41fe33e Merge pull request #1448 from huggingface/contributing
add contribution guidelines
2019-10-08 16:55:34 +02:00
61ed889005 remove old seq2seq file 2019-10-08 16:30:58 +02:00
8abfee9ec3 rename Bert2Bert -> Bert2Rnd 2019-10-08 16:30:58 +02:00
82628b0fc9 add a placeholder test 2019-10-08 16:30:58 +02:00
0700983090 Add BertDecoderModel and Bert2Bert classes
I am not sure what happens when the class is initialized with the
pretrained weights.
2019-10-08 16:30:58 +02:00
75feacf172 add general structure for Bert2Bert class 2019-10-08 16:30:58 +02:00
15a2fc88a6 add General attention classes
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.

There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
2019-10-08 16:30:58 +02:00
cd6a59d5c1 add a decoder layer for Bert 2019-10-08 16:30:58 +02:00
45de313a9e add bullet point on modifying an existing PR 2019-10-08 11:54:10 +02:00
ade05b6cef add code contribution 2019-10-07 23:20:25 +02:00
e9c09052a4 add issues and requests guidelines 2019-10-07 22:30:55 +02:00
8fcc6507ce Multilingual 2019-10-07 15:02:42 -04:00
6e3e1c959e Merge pull request #1447 from huggingface/dev-requirements
Provide requirements.txt for development dependencies
2019-10-07 18:49:26 +02:00
7ce83b4931 update weights for distilgpt2 2019-10-07 12:30:27 -04:00
9f81f1cba8 fix convert pt_to_tf2 for custom weights 2019-10-07 12:30:19 -04:00
7afd00a661 freeze dev requirements 2019-10-07 17:58:13 +02:00
a0dcefa382 generalize BertSelfAttention to take separate query, key, value
There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.

This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.
2019-10-07 17:53:58 +02:00
31adbb247c add class wireframes for Bert decoder 2019-10-07 16:43:21 +02:00
dda1adad6d rename BertLayer to BertEncoderLayer 2019-10-07 16:31:46 +02:00
0053c0e052 do some (light) housekeeping
Several packages were imported but never used, indentation and line
spaces did not follow PEP8.
2019-10-07 16:29:15 +02:00
bd5363cc83 update CTRL configuration 2019-10-07 15:37:30 +02:00
dc89441167 update CTRL pytorch model 2019-10-07 15:37:25 +02:00
320b7a7e01 fix #1416 2019-10-07 14:26:59 +02:00
386e86e222 raise exception when class initialized with __init__ 2019-10-07 13:00:06 +02:00
4446c02b8a add wireframe for seq2seq model 2019-10-07 12:04:05 +02:00
1615360c71 Merge pull request #1438 from SeanBE/master
fix pytorch-transformers migration description in README
2019-10-07 05:02:23 -04:00
6dc6c716c5 fix pytorch-transformers migration description in README 2019-10-07 09:59:54 +01:00
904158ac4d Rephrase forward method to reduce ambiguity 2019-10-06 23:40:52 -04:00
0f65d8cbbe Fix some typos in README 2019-10-06 23:40:52 -04:00
1dea291a02 Remove unnecessary use of FusedLayerNorm in XLNet 2019-10-06 13:35:01 -04:00
f3e0218fbb Correct device assignment in run_generation 2019-10-05 21:05:16 -04:00
78ef1a9930 fixes 2019-10-04 17:59:44 -04:00
6c1d0bc066 update encode_plus - add truncation strategies 2019-10-04 17:38:38 -04:00
0820bb0555 unecessary carriage return 2019-10-04 17:23:15 -04:00
f5891c3821 run_squad --> run_squad_w_distillation 2019-10-04 17:23:15 -04:00
764a7923ec add distillation+finetuning option in run_squad 2019-10-04 17:23:15 -04:00
bb464289ce New model addition issue template 2019-10-04 16:41:26 -04:00
92c0f2fb90 Merge remote-tracking branch 'origin/julien_multiple-choice' into encoding-qol 2019-10-04 15:48:06 -04:00
9e136ff57c Honor args.overwrite_cache (h/t @erenup) 2019-10-04 15:00:56 -04:00
7bddb45a6f Decode documentaton 2019-10-04 14:27:38 -04:00
dbed1c5d94 Adding CTRL (squashed commit)
adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well
2019-10-03 22:29:03 -07:00
b3cfd97946 Merge pull request #1373 from TimYagan/fix-css
Fixed critical css font-family issues
2019-10-03 19:04:02 -04:00
81a1e12469 Merge pull request #1313 from enzoampil/master
Add option to use a 'stop token'
2019-10-03 22:43:57 +00:00
d3f24dfad7 Merge branch 'master' into master 2019-10-03 22:43:09 +00:00
ecc4f1bdfa XLM use_lang_embedding flag in run_generation 2019-10-03 17:42:16 -04:00
c2c2ca0fdb Added XLM to run_generation, with prompt language selection. 2019-10-03 17:18:48 -04:00
1569610f2d Merge pull request #1296 from danai-antoniou/add-duplicate-tokens-error
Added ValueError for duplicates in list of added tokens
2019-10-03 17:06:17 -04:00
e1b2949ae6 DistillBert Documentation Code Example fixes 2019-10-03 15:51:33 -04:00
899883644f Fix test fails and warnings
Attention output was in bnij ordering instead of ijbn which everything
else will expect. This was an oversight on my part, and keeps the
attention inputs/outputs identical to the original code.

Also moved back from tensor slicing to index_select in rel_shift_bnij to
make the tracer happy.
2019-10-03 12:05:15 -04:00
e2ae9c0b73 fix links in doc index 2019-10-03 11:42:21 -04:00
aebd83230f Update naming + remove f string in run_lm_finetuning example 2019-10-03 11:31:36 -04:00
651bfb7ad5 always_truncate by default 2019-10-03 11:31:36 -04:00
5ed50a93fb LM finetuning won't mask special tokens anymore 2019-10-03 11:31:36 -04:00
cc412edd42 Supports already existing special tokens 2019-10-03 11:31:36 -04:00
2f259b228e Sequence IDS 2019-10-03 11:31:36 -04:00
7c789c337d Always truncate argument in the encode method 2019-10-03 11:31:36 -04:00
7af0777910 Update run_glue.py
add DistilBert model shortcut into ALL_MODELS
2019-10-03 15:31:11 +00:00
c1689ac301 fix name 2019-10-03 10:56:39 -04:00
4a790c40b1 update doc for distil* 2019-10-03 10:54:02 -04:00
6be46a6e64 update links to new weights 2019-10-03 10:27:11 -04:00
5f07d8f11a prepare release 2019-10-03 10:27:11 -04:00
35071007cb incoming release 🔥 update links to arxiv preprint 2019-10-03 10:27:11 -04:00
f1f23ad171 fix buf in convert_pt_chkpt_to_tf2 2019-10-03 10:27:11 -04:00
2a91f6071f upddate README - TODO updadte link to paper 2019-10-03 10:27:11 -04:00
c51e533a5f update train.py 2019-10-03 10:27:11 -04:00
a76c3f9cb0 update requirements 2019-10-03 10:27:11 -04:00
bb9c5ead54 update distiller 2019-10-03 10:27:11 -04:00
a12ab0a8db update binarized_data 2019-10-03 10:27:11 -04:00
4d6dfbd376 update extract 2019-10-03 10:27:11 -04:00
23edebc079 update extract_distilbert 2019-10-03 10:27:11 -04:00
cbfcfce205 update token_counts 2019-10-03 10:27:11 -04:00
19e4ebbe3f grouped_batch_sampler 2019-10-03 10:27:11 -04:00
594202a934 lm_seqs_dataset 2019-10-03 10:27:11 -04:00
38084507c4 add distillation_configs 2019-10-03 10:27:11 -04:00
9ffda216ec Fix missed head transpose 2019-10-03 09:23:16 -04:00
b5d73976ad Revert "fixing for roberta tokenizer decoding"
This reverts commit 22e7c4edaf007d92912df54c336d10078ac7d565.
2019-10-03 20:48:17 +08:00
22e7c4edaf fixing for roberta tokenizer decoding 2019-10-03 18:33:53 +08:00
2195c0d5f9 Evaluation result.txt path changing #1286 2019-10-03 12:49:12 +08:00
ebb32261b1 fix #1401 2019-10-02 17:52:56 -04:00
d51b589404 Re-order attention head outputs for better perf
Significant performance boost over the original orderings
on an already somewhat optimised branch this gave me > 2x end-to-end
throughput on a squad xlnet fine-tuning task (batch 8, seq-length 612,
fp16)
2019-10-02 12:18:21 -04:00
63ed224b7c initialy -> initially 2019-10-02 15:04:18 +00:00
a95158518d Moved duplicate token check 2019-10-02 07:44:15 +01:00
d73957899a Merge branch 'master' of https://github.com/danai-antoniou/pytorch-transformers into add-duplicate-tokens-error 2019-10-02 07:38:50 +01:00
cd69bc9c87 Fixed typo in docs README 2019-10-02 03:21:55 +03:00
391db836ab fix #1260 - remove special logic for decoding pairs of sequence 2019-10-01 19:09:13 -04:00
963529e29b Merge pull request #1288 from echan00/master
Typo with LM Fine tuning script
2019-10-01 18:46:07 -04:00
f7978f70ec use format instead of f-strings 2019-10-01 18:45:38 -04:00
1e4a191366 Merge pull request #1284 from slayton58/pooler_end_logits_fp16_fix
Fix fp16 masking in PoolerEndLogits
2019-10-01 18:40:22 -04:00
c50783e388 Merge branch 'pooler_end_logits_fp16_fix' of https://github.com/slayton58/pytorch-transformers into pr/1284 2019-10-01 18:17:48 -04:00
6971556ab8 Fix syntax typo in README.md 2019-10-01 14:59:31 -04:00
b350662955 overflowing_tokens do not really make sense here, let's just return a number
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-30 16:37:09 -04:00
f5bcde0b2f [multiple-choice] Simplify and use tokenizer.encode_plus 2019-09-30 16:04:55 -04:00
5c3b32d44d Update README.md
Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
2019-09-30 18:48:01 +00:00
2dc8cb8734 fix unknown imports (*ForMultipleChoice) in run_multiple_choice 2019-09-29 19:51:01 -04:00
0a4ed7192e Fixed critical css font-family issues
Fixed critical css font-family issues to ensure compatibility with multiple webbrowsers
2019-09-29 13:51:01 +02:00
ae50ad91ea Merge pull request #1362 from FeiWang96/doc
fix link
2019-09-28 10:26:42 +02:00
60f791631b Fix link in readme 2019-09-28 16:20:17 +08:00
a6a6d9e638 fix padding_idx of RoBERTa model 2019-09-27 19:03:55 -04:00
d8b641c839 6 -> 8 models 2019-09-27 17:22:01 -04:00
c6acbdd50a Close #1304 2019-09-27 17:02:53 -04:00
df7cd9e4e4 Merge pull request #1353 from wendingp/patch-1
Fix some typos
2019-09-27 23:00:34 +02:00
6a17b3c51b Merge pull request #1355 from agrinh/master
Fix tensorflow_dataset glue support
2019-09-27 22:59:54 +02:00
04e9a6f512 Merge pull request #1359 from dennymarcels/patch-1
Update run_lm_finetuning.py
2019-09-27 22:58:19 +02:00
9478590630 Update run_lm_finetuning.py
The previous method, just as phrased, did not exist in the class.
2019-09-27 15:18:42 -03:00
795b3e76ff Add docstring for processor method 2019-09-27 17:32:28 +02:00
e31a472801 Fix tensorflow_dataset glue support
`glue_convert_examples_to_features` assumed that tensorflow_dataset
examples contains the features `'sentence1'` and `'sentence2'`. This
commit encapsulates the choice of features in the glue processor and
uses that to parse examples.
2019-09-27 17:16:02 +02:00
pj
4f2b6579bf Fix some typos 2019-09-27 22:55:43 +08:00
ca559826c4 Merge pull request #1349 from ogabrielluiz/master
Just some typos
2019-09-27 13:08:00 +02:00
d2de5b9d8c Just some typos 2019-09-27 07:08:36 -03:00
d83d295763 Merge pull request #1337 from mgrankin/fastdataset
faster dataset building
2019-09-27 10:35:12 +02:00
f6de000305 Merge pull request #1346 from BramVanroy/documentation
Add small  note about the output of hidden states (closes #1332)
2019-09-27 10:30:07 +02:00
15749bfc10 Add small note about the output of hidden states 2019-09-27 10:01:36 +02:00
da2e47ad15 clean up a little run_tf_glue 2019-09-27 09:41:15 +02:00
528c288fa9 clean up run_tf_glue 2019-09-27 09:40:29 +02:00
702f589848 fix input in run_glue for distilbert 2019-09-27 00:20:14 -04:00
22d2fded2c [docs] Fix doc auto-deploy
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-26 18:22:45 -04:00
fc9faa8a47 [docs] Doc tweaks
Co-Authored-By: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2019-09-26 18:19:51 -04:00
ecfddc6034 Update RoBERTa and GPT-2 Tokenizer documentation (fix #1343) 2019-09-26 16:49:03 -04:00
93f0c5fc72 Repository link in the documentation 2019-09-26 11:45:00 -04:00
6c3b131516 typo in readme/doc 2019-09-26 16:23:28 +02:00
f83b35b77d Merge branch 'master' of https://github.com/huggingface/pytorch-transformers 2019-09-26 16:14:23 +02:00
4e63c90720 update installation instructions in readme 2019-09-26 16:14:21 +02:00
7e957237e4 [Doc] XLM + Torch in documentation 2019-09-26 10:08:56 -04:00
302a4813a5 Doc building requirements [TF2] 2019-09-26 09:57:30 -04:00
f71a4577b8 faster dataset building 2019-09-26 16:53:13 +03:00
a3e0dbba95 Doc building requirements [TF] 2019-09-26 09:51:14 -04:00
0f92f76ca3 CircleCI reference in README 2019-09-26 08:59:52 -04:00
4094958df2 Doc building requirements 2019-09-26 08:50:55 -04:00
7d8b395afa Doc building requirements 2019-09-26 08:49:31 -04:00
927904bc91 [doc] pytorch_transformers -> transformers 2019-09-26 08:47:15 -04:00
294edfd83d Release version in documentation 2019-09-26 08:16:12 -04:00
de5e4864cb Documentation 2019-09-26 08:04:54 -04:00
e4e35296fb update setup.py metadata 2019-09-26 13:52:24 +02:00
a9f24a16bc [FIX] fix run_generation.py to work with batch_size > 1 2019-09-25 15:53:29 +03:00
4b543c3007 Add option to use a 'stop token' which will be used to truncate the output text to everything till right before the 'stop token' 2019-09-22 21:38:38 +08:00
2e6797cc7d Added valuerror for duplicate added tokens 2019-09-19 15:40:42 +01:00
f0340eccf9 Typo
Typo
2019-09-18 13:42:11 -07:00
ec94f4e0f8 Fix fp16 masking in PoolerEndLogits
Necessary to run xlnet (at least in squad) with `--fp16 --fp16_opt_level="O2"`, otherwise loss is immediately `NaN` and fine-tuning cannot proceed.
2019-09-18 09:30:58 -04:00
1111 changed files with 319249 additions and 47128 deletions

View File

@ -1,88 +1,404 @@
version: 2
version: 2.1
orbs:
gcp-gke: circleci/gcp-gke@1.0.4
go: circleci/go@1.3.0
# TPU REFERENCES
references:
checkout_ml_testing: &checkout_ml_testing
run:
name: Checkout ml-testing-accelerators
command: |
git clone https://github.com/GoogleCloudPlatform/ml-testing-accelerators.git
cd ml-testing-accelerators
git fetch origin 5e88ac24f631c27045e62f0e8d5dfcf34e425e25:stable
git checkout stable
build_push_docker: &build_push_docker
run:
name: Configure Docker
command: |
gcloud --quiet auth configure-docker
cd docker/transformers-pytorch-tpu
if [ -z "$CIRCLE_PR_NUMBER" ]; then docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" . ; else docker build --tag "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" -f Dockerfile --build-arg "TEST_IMAGE=1" --build-arg "GITHUB_REF=pull/$CIRCLE_PR_NUMBER/head" . ; fi
docker push "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID"
deploy_cluster: &deploy_cluster
run:
name: Deploy the job on the kubernetes cluster
command: |
go get github.com/google/go-jsonnet/cmd/jsonnet && \
export PATH=$PATH:$HOME/go/bin && \
kubectl create -f docker/transformers-pytorch-tpu/dataset.yaml || true && \
job_name=$(jsonnet -J ml-testing-accelerators/ docker/transformers-pytorch-tpu/bert-base-cased.jsonnet --ext-str image=$GCR_IMAGE_PATH --ext-str image-tag=$CIRCLE_WORKFLOW_JOB_ID | kubectl create -f -) && \
job_name=${job_name#job.batch/} && \
job_name=${job_name% created} && \
echo "Waiting on kubernetes job: $job_name" && \
i=0 && \
# 30 checks spaced 30s apart = 900s total.
max_checks=30 && \
status_code=2 && \
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try max_checks times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
while [ $i -lt $max_checks ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else echo "Job not finished yet"; fi; sleep 30; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
echo "GKE pod name: $pod_name" && \
kubectl logs -f $pod_name --container=train
echo "Done with log retrieval attempt." && \
gcloud container images delete "$GCR_IMAGE_PATH:$CIRCLE_WORKFLOW_JOB_ID" --force-delete-tags && \
exit $status_code
delete_gke_jobs: &delete_gke_jobs
run:
name: Delete GKE Jobs
command: |
# Match jobs whose age matches patterns like '1h' or '1d', i.e. any job
# that has been around longer than 1hr. First print all columns for
# matches, then execute the delete.
kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $0}'
kubectl delete job $(kubectl get job | awk 'match($4,/[0-9]+[dh]/) {print $1}')
jobs:
build_py3_torch_and_tf:
run_tests_torch_and_tf:
working_directory: ~/transformers
docker:
- image: circleci/python:3.5
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install torch
- run: sudo pip install tensorflow==2.0.0-rc0
- run: sudo pip install --progress-bar off .
- run: sudo pip install pytest codecov pytest-cov
- run: sudo pip install tensorboardX scikit-learn
- run: python -m pytest -sv ./transformers/tests/ --cov
- run: codecov
build_py3_torch:
- restore_cache:
keys:
- v0.4-torch_and_tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache:
key: v0.4-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PT_TF_CROSS_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_tf ./tests/ -m is_pt_tf_cross_test --durations=0 | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_torch:
working_directory: ~/transformers
docker:
- image: circleci/python:3.5
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install torch
- run: sudo pip install --progress-bar off .
- run: sudo pip install pytest codecov pytest-cov
- run: sudo pip install tensorboardX scikit-learn
- run: python -m pytest -sv ./transformers/tests/ --cov
- run: python -m pytest -sv ./examples/
- run: codecov
build_py3_tf:
- restore_cache:
keys:
- v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache:
key: v0.4-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -s --make-reports=tests_torch ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_tf:
working_directory: ~/transformers
docker:
- image: circleci/python:3.5
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install tensorflow==2.0.0-rc0
- run: sudo pip install --progress-bar off .
- run: sudo pip install pytest codecov pytest-cov
- run: sudo pip install tensorboardX scikit-learn
- run: python -m pytest -sv ./transformers/tests/ --cov
- run: codecov
build_py2_torch:
- restore_cache:
keys:
- v0.4-tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece]
- save_cache:
key: v0.4-tf-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_tf ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_flax:
working_directory: ~/transformers
resource_class: large
parallelism: 1
docker:
- image: circleci/python:2.7
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install torch
- run: sudo pip install --progress-bar off .
- run: sudo pip install pytest codecov pytest-cov
- run: python -m pytest -sv ./transformers/tests/ --cov
- run: codecov
build_py2_tf:
- restore_cache:
keys:
- v0.4-flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: sudo pip install .[flax,sklearn,torch,testing,sentencepiece]
- save_cache:
key: v0.4-flax-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_flax ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_pipelines_torch:
working_directory: ~/transformers
resource_class: large
parallelism: 1
docker:
- image: circleci/python:2.7
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo pip install tensorflow==2.0.0-rc0
- run: sudo pip install --progress-bar off .
- run: sudo pip install pytest codecov pytest-cov
- run: python -m pytest -sv ./transformers/tests/ --cov
- run: codecov
- restore_cache:
keys:
- v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece]
- run: pip install tapas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cpu.html
- save_cache:
key: v0.4-torch-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PIPELINE_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_torch -m is_pipeline_test ./tests/ | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_pipelines_tf:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece]
- save_cache:
key: v0.4-tf-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: RUN_PIPELINE_TESTS=1 python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_tf ./tests/ -m is_pipeline_test | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_custom_tokenizers:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
environment:
RUN_CUSTOM_TOKENIZERS: yes
steps:
- checkout
- restore_cache:
keys:
- v0.4-custom_tokenizers-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[ja,testing,sentencepiece]
- run: python -m unidic download
- save_cache:
key: v0.4-custom_tokenizers-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -s --make-reports=tests_custom_tokenizers ./tests/test_tokenization_bert_japanese.py | tee tests_output.txt
- store_artifacts:
path: ~/transformers/tests_output.txt
- store_artifacts:
path: ~/transformers/reports
run_examples_torch:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-torch_examples-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install .[sklearn,torch,sentencepiece,testing]
- run: pip install -r examples/_tests_requirements.txt
- save_cache:
key: v0.4-torch_examples-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/ | tee examples_output.txt
- store_artifacts:
path: ~/transformers/examples_output.txt
- store_artifacts:
path: ~/transformers/reports
run_tests_git_lfs:
working_directory: ~/transformers
docker:
- image: circleci/python:3.7
resource_class: xlarge
parallelism: 1
steps:
- checkout
- run: sudo apt-get install git-lfs
- run: |
git config --global user.email "ci@dummy.com"
git config --global user.name "ci"
- run: pip install --upgrade pip
- run: pip install .[testing]
- run: RUN_GIT_LFS_TESTS=1 python -m pytest -sv ./tests/test_hf_api.py -k "HfLargefilesTest"
build_doc:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
steps:
- checkout
- restore_cache:
keys:
- v0.4-build_doc-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install ."[all, docs]"
- save_cache:
key: v0.4-build_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: cd docs && make html SPHINXOPTS="-W"
- store_artifacts:
path: ./docs/_build
deploy_doc:
working_directory: ~/transformers
docker:
- image: circleci/python:3.5
- image: circleci/python:3.6
steps:
- add_ssh_keys:
fingerprints:
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
fingerprints:
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
- checkout
- run: sudo pip install --progress-bar off -r docs/requirements.txt
- run: sudo pip install --progress-bar off -r requirements.txt
- run: cd docs/source && ln -s ../../examples/README.md examples.md && cd -
- run: cd docs && make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
- restore_cache:
keys:
- v0.4-deploy_doc-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install ."[all,docs]"
- save_cache:
key: v0.4-deploy_doc-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: ./.circleci/deploy.sh
check_code_quality:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
resource_class: medium
parallelism: 1
steps:
- checkout
- restore_cache:
keys:
- v0.4-code_quality-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip
- run: pip install isort
- run: pip install .[all,quality]
- save_cache:
key: v0.4-code_quality-{{ checksum "setup.py" }}
paths:
- '~/.cache/pip'
- run: black --check examples tests src utils
- run: isort --check-only examples tests src utils
- run: flake8 examples tests src utils
- run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
- run: python utils/check_copies.py
- run: python utils/check_table.py
- run: python utils/check_dummies.py
- run: python utils/check_repo.py
check_repository_consistency:
working_directory: ~/transformers
docker:
- image: circleci/python:3.6
resource_class: small
parallelism: 1
steps:
- checkout
- run: pip install requests
- run: python ./utils/link_tester.py
# TPU JOBS
run_examples_tpu:
docker:
- image: circleci/python:3.6
environment:
OMP_NUM_THREADS: 1
resource_class: xlarge
parallelism: 1
steps:
- checkout
- go/install
- *checkout_ml_testing
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- setup_remote_docker
- *build_push_docker
- *deploy_cluster
cleanup-gke-jobs:
docker:
- image: circleci/python:3.6
steps:
- gcp-gke/install
- gcp-gke/update-kubeconfig-with-credentials:
cluster: $GKE_CLUSTER
perform-login: true
- *delete_gke_jobs
workflow_filters: &workflow_filters
filters:
branches:
@ -92,9 +408,28 @@ workflows:
version: 2
build_and_test:
jobs:
- build_py3_torch_and_tf
- build_py3_torch
- build_py3_tf
- build_py2_torch
- build_py2_tf
- deploy_doc: *workflow_filters
- check_code_quality
- check_repository_consistency
- run_examples_torch
- run_tests_custom_tokenizers
- run_tests_torch_and_tf
- run_tests_torch
- run_tests_tf
- run_tests_flax
- run_tests_pipelines_torch
- run_tests_pipelines_tf
- run_tests_git_lfs
- build_doc
- deploy_doc: *workflow_filters
tpu_testing_jobs:
triggers:
- schedule:
# Set to run at the first minute of every hour.
cron: "0 8 * * *"
filters:
branches:
only:
- master
jobs:
- cleanup-gke-jobs
- run_examples_tpu

57
.circleci/deploy.sh Executable file
View File

@ -0,0 +1,57 @@
cd docs
function deploy_doc(){
echo "Creating doc at commit $1 and pushing to folder $2"
git checkout $1
if [ ! -z "$2" ]
then
if [ "$2" == "master" ]; then
echo "Pushing master"
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir/$2/
cp -r _build/html/_static .
elif ssh -oStrictHostKeyChecking=no $doc "[ -d $dir/$2 ]"; then
echo "Directory" $2 "already exists"
scp -r -oStrictHostKeyChecking=no _static/* $doc:$dir/$2/_static/
else
echo "Pushing version" $2
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
fi
else
echo "Pushing stable"
make clean && make html
rm -rf _build/html/_static
cp -r _static _build/html
scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
fi
}
# You can find the commit for each tag on https://github.com/huggingface/transformers/tags
deploy_doc "master" master
deploy_doc "b33a385" v1.0.0
deploy_doc "fe02e45" v1.1.0
deploy_doc "89fd345" v1.2.0
deploy_doc "fc9faa8" v2.0.0
deploy_doc "3ddce1d" v2.1.1
deploy_doc "3616209" v2.2.0
deploy_doc "d0f8b9a" v2.3.0
deploy_doc "6664ea9" v2.4.0
deploy_doc "fb560dc" v2.5.0
deploy_doc "b90745c" v2.5.1
deploy_doc "fbc5bf1" v2.6.0
deploy_doc "6f5a12a" v2.7.0
deploy_doc "11c3257" v2.8.0
deploy_doc "e7cfc1a" v2.9.0
deploy_doc "7cb203f" v2.9.1
deploy_doc "10d7239" v2.10.0
deploy_doc "b42586e" v2.11.0
deploy_doc "7fb8bdf" v3.0.2
deploy_doc "4b3ee9c" v3.1.0
deploy_doc "3ebb1b3" v3.2.0
deploy_doc "0613f05" v3.3.1
deploy_doc "eb0e0ce" v3.4.0
deploy_doc "818878d" v3.5.1
deploy_doc "c781171" v4.0.0
deploy_doc "bfa4ccf" # v4.1.1 Latest stable release

View File

@ -0,0 +1,22 @@
---
name: "\U0001F5A5 New benchmark"
about: Benchmark a part of this library and share your results
title: "[Benchmark]"
labels: ''
assignees: ''
---
# 🖥 Benchmarking `transformers`
## Benchmark
Which part of `transformers` did you benchmark?
## Set-up
What did you run your benchmarks on? Please include details, such as: CPU, GPU? If using multiple GPUs, which parallelization did you use?
## Results
Put your results here!

View File

@ -0,0 +1,20 @@
---
name: "\U0001F31F New model addition"
about: Submit a proposal/request to implement a new Transformer-based model
title: ''
labels: New model
assignees: ''
---
# 🌟 New model addition
## Model description
<!-- Important information -->
## Open source status
* [ ] the model implementation is available: (give details)
* [ ] the model weights are available: (give details)
* [ ] who are the authors: (mention them, if possible by @gh-username)

View File

@ -1,25 +1,71 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve PyTorch Transformers
about: Submit a bug report to help us improve transformers
title: ''
labels: ''
assignees: ''
---
## 🐛 Bug
<!-- Important information -->
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
Model I am using (Bert, XLNet....):
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Language I am using the model on (English, Chinese....):
### Who can help
<!-- Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
The problem arise when using:
* [ ] the official example scripts: (give details)
* [ ] my own modified scripts: (give details)
albert, bert, GPT2, XLM: @LysandreJik
tokenizers: @mfuntowicz
Trainer: @sgugger
Speed and Memory Benchmarks: @patrickvonplaten
Model Cards: @julien-c
TextGeneration: @TevenLeScao
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @patrickvonplaten @TevenLeScao
Blenderbot: @patrickvonplaten
Bart: @patrickvonplaten
Marian: @patrickvonplaten
Pegasus: @patrickvonplaten
mBART: @patrickvonplaten
T5: @patrickvonplaten
Longformer/Reformer: @patrickvonplaten
TransfoXL/XLNet: @TevenLeScao
RAG: @patrickvonplaten, @lhoestq
FSMT: @stas00
examples/seq2seq: @patil-suraj
examples/bert-loses-patience: @JetRunner
ray/raytune: @richardliaw @amogkam
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
-->
## Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details)
* [ ] my own task or dataset: (give details below)
## To Reproduce
## To reproduce
Steps to reproduce the behavior:
@ -27,22 +73,10 @@ Steps to reproduce the behavior:
2.
3.
<!-- If you have a code sample, error messages, stack traces, please provide it here as well. -->
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
<!-- A clear and concise description of what you expected to happen. -->
## Environment
* OS:
* Python version:
* PyTorch version:
* PyTorch Transformers version (or branch):
* Using GPU ?
* Distributed of parallel setup ?
* Any other relevant information:
## Additional context
<!-- Add any other context about the problem here. -->
<!-- A clear and concise description of what you would expect to happen. -->

View File

@ -1,16 +1,25 @@
---
name: "\U0001F680 Feature Request"
about: Submit a proposal/request for a new PyTorch Transformers feature
name: "\U0001F680 Feature request"
about: Submit a proposal/request for a new transformers feature
title: ''
labels: ''
assignees: ''
---
## 🚀 Feature
# 🚀 Feature request
<!-- A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist. -->
<!-- A clear and concise description of the feature proposal.
Please provide a link to the paper and code in case they exist. -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too. -->
<!-- Please outline the motivation for the proposal. Is your feature request
related to a problem? e.g., I'm always frustrated when [...]. If this is related
to another GitHub issue, please link here too. -->
## Additional context
## Your contribution
<!-- Add any other context or screenshots about the feature request here. -->
<!-- Is there any way that you could help, e.g. by submitting a PR?
Make sure to read the CONTRIBUTING.MD readme:
https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md -->

View File

@ -1,43 +1,58 @@
---
name: "\U0001F4DA Migration from PyTorch-pretrained-Bert"
about: Report a problem when migrating from PyTorch-pretrained-Bert to Transformers
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
about: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers
to transformers
title: ''
labels: Migration
assignees: ''
---
## 📚 Migration
# 📚 Migration
## Information
<!-- Important information -->
Model I am using (Bert, XLNet....):
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese....):
Language I am using the model on (English, Chinese ...):
The problem arise when using:
* [ ] the official example scripts: (give details)
* [ ] my own modified scripts: (give details)
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details)
* [ ] my own task or dataset: (give details below)
Details of the issue:
## Details
<!-- A clear and concise description of the migration issue. If you have code snippets, please provide it here as well. -->
<!-- A clear and concise description of the migration issue.
If you have code snippets, please provide it here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
-->
## Environment
## Environment info
<!-- You can run the command `python transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
<!-- IMPORTANT: which version of the former library do you use? -->
* `pytorch-transformers` or `pytorch-pretrained-bert` version (or branch):
* OS:
* Python version:
* PyTorch version:
* PyTorch Transformers version (or branch):
* Using GPU ?
* Distributed of parallel setup ?
* Any other relevant information:
## Checklist
- [ ] I have read the migration guide in the readme.
([pytorch-transformers](https://github.com/huggingface/transformers#migrating-from-pytorch-transformers-to-transformers);
[pytorch-pretrained-bert](https://github.com/huggingface/transformers#migrating-from-pytorch-pretrained-bert-to-transformers))
- [ ] I checked if a related official extension example runs on my machine.
## Additional context
<!-- Add any other context about the problem here. -->

View File

@ -1,8 +1,26 @@
---
name: "❓Questions & Help"
about: Start a general discussion related to PyTorch Transformers
name: "❓ Questions & Help"
about: Post your general questions on the Hugging Face forum: https://discuss.huggingface.co/
title: ''
labels: ''
assignees: ''
---
## ❓ Questions & Help
# ❓ Questions & Help
<!-- A clear and concise description of the question. -->
<!-- The GitHub issue tracker is primarly intended for bugs, feature requests,
new models, benchmarks, and migration questions. For all other questions,
we direct you to the Hugging Face forum: https://discuss.huggingface.co/ .
-->
## Details
<!-- Description of your issue -->
<!-- You should first ask your question on the forum, and only if
you didn't get an answer after a few days ask it here on GitHub. -->
**A link to original question on the forum**:
<!-- Your issue will be closed if you don't fill this part. -->

62
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@ -0,0 +1,62 @@
# What does this PR do?
<!--
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
albert, bert, XLM: @LysandreJik
GPT2: @LysandreJik, @patrickvonplaten
tokenizers: @mfuntowicz
Trainer: @sgugger
Benchmarks: @patrickvonplaten
Model Cards: @julien-c
examples/distillation: @VictorSanh
nlp datasets: [different repo](https://github.com/huggingface/nlp)
rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Text Generation: @patrickvonplaten, @TevenLeScao
Blenderbot, Bart, Marian, Pegasus: @patrickvonplaten
T5: @patrickvonplaten
Rag: @patrickvonplaten, @lhoestq
EncoderDecoder: @patrickvonplaten
Longformer, Reformer: @patrickvonplaten
TransfoXL, XLNet: @TevenLeScao, @patrickvonplaten
examples/seq2seq: @patil-suraj
examples/bert-loses-patience: @JetRunner
tensorflow: @jplu
examples/token-classification: @stefan-it
documentation: @sgugger
FSMT: @stas00
-->

1
.github/conda/build.sh vendored Normal file
View File

@ -0,0 +1 @@
$PYTHON setup.py install # Python command to install the script.

48
.github/conda/meta.yaml vendored Normal file
View File

@ -0,0 +1,48 @@
{% set name = "transformers" %}
package:
name: "{{ name|lower }}"
version: "{{ TRANSFORMERS_VERSION }}"
source:
path: ../../
build:
noarch: python
requirements:
host:
- python
- pip
- numpy
- dataclasses
- packaging
- filelock
- requests
- tqdm >=4.27
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers ==0.9.4
run:
- python
- numpy
- dataclasses
- packaging
- filelock
- requests
- tqdm >=4.27
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers ==0.9.4
test:
imports:
- transformers
about:
home: https://huggingface.co
license: Apache License 2.0
license_file: LICENSE
summary: "🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0."

1
.github/stale.yml vendored
View File

@ -6,6 +6,7 @@ daysUntilClose: 7
exemptLabels:
- pinned
- security
- Feature request
# Label to use when marking an issue as stale
staleLabel: wontfix
# Comment to post when marking an issue as stale. Set to `false` to disable

46
.github/workflows/github-torch-hub.yml vendored Normal file
View File

@ -0,0 +1,46 @@
name: Torch hub integration
on:
push:
branches:
- "*"
jobs:
torch_hub_integration:
runs-on: ubuntu-latest
env:
# TODO quickfix but may need more investigation
ACTIONS_ALLOW_UNSECURE_COMMANDS: True
steps:
# no checkout necessary here.
- name: Extract branch name
run: echo "::set-env name=BRANCH::${GITHUB_REF#refs/heads/}"
- name: Check branch name
run: echo $BRANCH
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Loading cache
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v0-torch_hub-${{ hashFiles('setup.py') }}
- name: Install dependencies
run: |
pip install --upgrade pip
# install torch-hub specific dependencies
pip install -e git+https://github.com/huggingface/transformers.git#egg=transformers[torchhub]
# no longer needed
pip uninstall -y transformers
- name: Torch hub list
run: |
python -c "import torch; print(torch.hub.list('huggingface/transformers:$BRANCH'))"
- name: Torch hub help
run: |
python -c "import torch; print(torch.hub.help('huggingface/transformers:$BRANCH', 'modelForSequenceClassification'))"

67
.github/workflows/model-templates.yml vendored Normal file
View File

@ -0,0 +1,67 @@
name: Model templates runner
on:
push:
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
jobs:
run_tests_templates:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v1
- name: Install Python
uses: actions/setup-python@v1
with:
python-version: 3.6
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1.2-tests_templates
restore-keys: |
v1.2-tests_templates-${{ hashFiles('setup.py') }}
v1.2-tests_templates
- name: Install dependencies
run: |
pip install --upgrade pip
pip install .[dev]
- name: Create model files
run: |
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
make style
python utils/check_table.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
- name: Run all non-slow tests
run: |
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_templates tests/*template*
- name: Run style changes
run: |
git fetch origin master:master
make fixup
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_templates_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_templates_test_reports
path: reports

44
.github/workflows/release-conda.yml vendored Normal file
View File

@ -0,0 +1,44 @@
name: Release - Conda
on:
push:
tags:
- v*
env:
ANACONDA_API_TOKEN: ${{ secrets.ANACONDA_API_TOKEN }}
jobs:
build_and_package:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
steps:
- name: Checkout repository
uses: actions/checkout@v1
- name: Install miniconda
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
auto-activate-base: false
activate-environment: "build-transformers"
channels: huggingface
- name: Setup conda env
run: |
conda install -c defaults anaconda-client conda-build
- name: Extract version
run: echo "TRANSFORMERS_VERSION=`python setup.py --version`" >> $GITHUB_ENV
- name: Build conda packages
run: |
conda info
conda list
conda-build .github/conda
- name: Upload to Anaconda
run: anaconda upload `conda-build .github/conda --output` --force

275
.github/workflows/self-push.yml vendored Normal file
View File

@ -0,0 +1,275 @@
name: Self-hosted runner (push)
on:
push:
branches:
- master
- ci_*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
# pull_request:
repository_dispatch:
jobs:
run_tests_torch_gpu:
runs-on: [self-hosted, gpu, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_torch_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
# - name: Create model files
# run: |
# source .env/bin/activate
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
CUDA_VISIBLE_DEVICES: 0
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_torch_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_gpu_test_reports
path: reports
run_tests_tf_gpu:
runs-on: [self-hosted, gpu, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_tf_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Create model files
run: |
source .env/bin/activate
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
# transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
CUDA_VISIBLE_DEVICES: 0
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_tf_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_gpu_test_reports
path: reports
run_tests_torch_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_torch_multi_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip install pandas torch-scatter -f https://pytorch-geometric.com/whl/torch-1.7.0+cu102.html
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_torch_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_multi_gpu_test_reports
path: reports
run_tests_tf_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-tests_tf_multi_gpu-${{ hashFiles('setup.py') }}
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all non-slow tests on GPU
env:
OMP_NUM_THREADS: 1
run: |
source .env/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_tf_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_multi_gpu_test_reports
path: reports

356
.github/workflows/self-scheduled.yml vendored Normal file
View File

@ -0,0 +1,356 @@
# configuration notes:
#
# - `source .env/bin/activate` is currently needed to be run first thing first in each step. Otherwise
# the step uses the system-wide python interpreter.
name: Self-hosted runner (scheduled)
on:
repository_dispatch:
schedule:
- cron: "0 0 * * *"
jobs:
run_all_tests_torch_gpu:
runs-on: [self-hosted, gpu, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v 1.1-slow_tests_torch_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt
- name: Run examples tests on GPU
if: ${{ always() }}
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
pip install -r examples/_tests_requirements.txt
python -m pytest -n 1 --dist=loadfile -s --make-reports=examples_torch_gpu examples
- name: Failure short reports
if: ${{ always() }}
run: cat reports/examples_torch_gpu_failures_short.txt
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_pipeline_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_gpu_test_reports
path: reports
run_all_tests_tf_gpu:
runs-on: [self-hosted, gpu, single-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_tf_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_tf_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_gpu_failures_short.txt
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_tf_pipelines_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipelines_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_gpu_test_reports
path: reports
run_all_tests_torch_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_torch_multi_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[torch,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
- name: Run all tests on multi-GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Run examples tests on multi-GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch_examples_multi_gpu examples
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_examples_multi_gpu_failures_short.txt
- name: Run all pipeline tests on multi-GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_pipeline_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_multi_gpu_test_reports
path: reports
run_all_tests_tf_multi_gpu:
runs-on: [self-hosted, gpu, multi-gpu]
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: .env
key: v1.1-slow_tests_tf_multi_gpu-${{ hashFiles('setup.py') }}
- name: Python version
run: |
which python
python --version
pip --version
- name: Current dir
run: pwd
- run: nvidia-smi
- name: Create new python env (on self-hosted runners we have to handle isolation ourselves)
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv .env
source .env/bin/activate
which python
python --version
pip --version
- name: Install dependencies
run: |
source .env/bin/activate
pip install --upgrade pip
pip install .[tf,sklearn,testing,onnxruntime,sentencepiece]
pip install git+https://github.com/huggingface/datasets
pip list
- name: Are GPUs recognized by our DL frameworks
run: |
source .env/bin/activate
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on multi-GPU
env:
OMP_NUM_THREADS: 1
RUN_SLOW: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_tf_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_multi_gpu_failures_short.txt
- name: Run all pipeline tests on multi-GPU
if: ${{ always() }}
env:
TF_FORCE_GPU_ALLOW_GROWTH: "true"
OMP_NUM_THREADS: 1
RUN_SLOW: yes
RUN_PIPELINE_TESTS: yes
run: |
source .env/bin/activate
python -m pytest -n 1 --dist=loadfile -s -m is_pipeline_test --make-reports=tests_tf_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipeline_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_multi_gpu_test_reports
path: reports

36
.gitignore vendored
View File

@ -8,6 +8,13 @@ __pycache__/
# C extensions
*.so
# tests and logs
tests/fixtures/*
!tests/fixtures/sample_text_no_unicode.txt
logs/
lightning_logs/
lang_code_data/
# Distribution / packaging
.Python
build/
@ -116,19 +123,42 @@ dmypy.json
.pyre/
# vscode
.vs
.vscode
# Pycharm
.idea
# TF code
tensorflow_code
# Models
models
proc_data
# examples
runs
examples/runs
/runs_old
/wandb
/examples/runs
/examples/**/*.args
/examples/rag/sweep
# data
/data
serialization_dir
serialization_dir
# emacs
*.*~
debug.env
# vim
.*.swp
#ctags
tags
# pre-commit
.pre-commit*
# .lock
*.lock

129
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,129 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
feedback@huggingface.co.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
Community Impact Guidelines were inspired by [Mozilla's code of conduct
enforcement ladder](https://github.com/mozilla/diversity).
[homepage]: https://www.contributor-covenant.org
For answers to common questions about this code of conduct, see the FAQ at
https://www.contributor-covenant.org/faq. Translations are available at
https://www.contributor-covenant.org/translations.

355
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,355 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How to contribute to transformers?
Everyone is welcome to contribute, and we value everybody's contribution. Code
is thus not the only way to help the community. Answering questions, helping
others, reaching out and improving the documentations are immensely valuable to
the community.
It also helps us if you spread the word: reference the library from blog posts
on the awesome projects it made possible, shout out on Twitter every time it has
helped you, or simply star the repo to say "thank you".
Whichever way you choose to contribute, please be mindful to respect our
[code of conduct](https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md).
## You can contribute in so many ways!
There are 4 ways you can contribute to transformers:
* Fixing outstanding issues with the existing code;
* Implementing new models;
* Contributing to the examples or to the documentation;
* Submitting issues related to bugs or desired new features.
*All are equally valuable to the community.*
## Submitting a new issue or feature request
Do your best to follow these guidelines when submitting an issue or a feature
request. It will make it easier for us to come back to you quickly and with good
feedback.
### Did you find a bug?
The transformers are robust and reliable thanks to the users who notify us of
the problems they encounter. So thank you for reporting an issue.
First, we would really appreciate it if you could **make sure the bug was not
already reported** (use the search bar on Github under Issues).
Did not find it? :( So we can act quickly on it, please follow these steps:
* Include your **OS type and version**, the versions of **Python**, **PyTorch** and
**Tensorflow** when applicable;
* A short, self-contained, code snippet that allows us to reproduce the bug in
less than 30s;
* Provide the *full* traceback if an exception is raised.
To get the OS and software versions automatically, you can run the following command:
```bash
transformers-cli env
```
or from the root of the repository the following command:
```bash
python src/transformers/commands/transformers_cli.py env
```
### Do you want to implement a new model?
Awesome! Please provide the following information:
* Short description of the model and link to the paper;
* Link to the implementation if it is open-source;
* Link to the model weights if they are available.
If you are willing to contribute the model yourself, let us know so we can best
guide you.
We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them
in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates) folder.
### Do you want a new feature (that is not a model)?
A world-class feature request addresses the following points:
1. Motivation first:
* Is it related to a problem/frustration with the library? If so, please explain
why. Providing a code snippet that demonstrates the problem is best.
* Is it related to something you would need for a project? We'd love to hear
about it!
* Is it something you worked on and think could benefit the community?
Awesome! Tell us what problem it solved for you.
2. Write a *full paragraph* describing the feature;
3. Provide a **code snippet** that demonstrates its future use;
4. In case this is related to a paper, please attach a link;
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
If your issue is well written we're already 80% of the way there by the time you
post it.
We have added **templates** to guide you in the process of adding a new example script for training or testing the
models in the library. You can find them in the [`templates`](https://github.com/huggingface/transformers/tree/master/templates)
folder.
## Start contributing! (Pull Requests)
Before writing code, we strongly advise you to search through the existing PRs or
issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback.
You will need basic `git` proficiency to be able to contribute to
`transformers`. `git` is not the easiest tool to use but it has the greatest
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing:
1. Fork the [repository](https://github.com/huggingface/transformers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
under your GitHub user account.
2. Clone your fork to your local disk, and add the base repository as a remote:
```bash
$ git clone git@github.com:<your Github handle>/transformers.git
$ cd transformers
$ git remote add upstream https://github.com/huggingface/transformers.git
```
3. Create a new branch to hold your development changes:
```bash
$ git checkout -b a-descriptive-name-for-my-changes
```
**Do not** work on the `master` branch.
4. Set up a development environment by running the following command in a virtual environment:
```bash
$ pip install -e ".[dev]"
```
(If transformers was already installed in the virtual environment, remove
it with `pip uninstall transformers` before reinstalling it in editable
mode with the `-e` flag.)
To run the full test suite, you might need the additional dependency on `datasets` which requires a separate source
install:
```bash
$ git clone https://github.com/huggingface/datasets
$ cd datasets
$ pip install -e .
```
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
passes:
```bash
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
```bash
$ python -m pytest -n 3 --dist=loadfile -s -v ./tests/
```
Adjust the value of `-n` to fit the load your hardware can support.
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
```bash
$ make style
```
`transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
You can do the automatic style corrections and code verifications that can't be automated in one go:
```bash
$ make fixup
```
This target is also optimized to only work with files modified by the PR you're working on.
If you're modifying documents under `docs/source`, make sure to validate that
they can still be built. This check also runs in CI. To run a local check
make sure you have installed the documentation builder requirements, by
running `pip install .[tf,torch,docs]` once from the root of this repository
and then run:
```bash
$ make docs
```
Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally:
```bash
$ git add modified_file.py
$ git commit
```
Please write [good commit
messages](https://chris.beams.io/posts/git-commit/).
It is a good idea to sync your copy of the code with the original
repository regularly. This way you can quickly account for changes:
```bash
$ git fetch upstream
$ git rebase upstream/master
```
Push the changes to your account using:
```bash
$ git push -u origin a-descriptive-name-for-my-changes
```
6. Once you are satisfied (**and the checklist below is happy too**), go to the
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
to the project maintainers for review.
7. It's ok if maintainers ask you for changes. It happens to core contributors
too! So everyone can see the changes in the Pull request, work in your local
branch and push the changes to your fork. They will automatically appear in
the pull request.
### Checklist
1. The title of your pull request should be a summary of its contribution;
2. If your pull request addresses an issue, please mention the issue number in
the pull request description to make sure they are linked (and people
consulting the issue know you are working on it);
3. To indicate a work in progress please prefix the title with `[WIP]`. These
are useful to avoid duplicated work, and to differentiate it from PRs ready
to be merged;
4. Make sure existing tests pass;
5. Add high-coverage tests. No quality testing = no merge.
- If you are adding a new model, make sure that you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)`, which triggers the common tests.
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests, and make sure
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
CircleCI does not run the slow tests, but github actions does every night!
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
example.
### Tests
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
[examples folder](https://github.com/huggingface/transformers/tree/master/examples).
We like `pytest` and `pytest-xdist` because it's faster. From the root of the
repository, here's how to run tests with `pytest` for the library:
```bash
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/
```
and for the examples:
```bash
$ pip install -r examples/requirements.txt # only needed the first time
$ python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
In fact, that's how `make test` and `make test-examples` are implemented (sans the `pip install` line)!
You can specify a smaller set of tests in order to test only the feature
you're working on.
By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
`yes` to run them. This will download many gigabytes of models — make sure you
have enough disk space and a good Internet connection, or a lot of patience!
```bash
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/
```
Likewise, set the `RUN_CUSTOM_TOKENIZERS` environment variable to `yes` to run
tests for custom tokenizers, which don't run by default either.
🤗 Transformers uses `pytest` as a test runner only. It doesn't use any
`pytest`-specific features in the test suite itself.
This means `unittest` is fully supported. Here's how to run tests with
`unittest`:
```bash
$ python -m unittest discover -s tests -t . -v
$ python -m unittest discover -s examples -t examples -v
```
### Style guide
For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html).
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information.
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
### Develop on Windows
On windows, you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:
`git config core.autocrlf input`
One way one can run the make command on Window is to pass by MSYS2:
1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
4. Add `C:\msys64\usr\bin` to your PATH environment variable.
You can now use `make` from any terminal (Powershell, cmd.exe, etc) 🎉
### Syncing forked master with upstream (HuggingFace) master
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnessary notifications to the developers involved in these PRs,
when syncing the master branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked master.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream master
$ git commit -m '<your message without GitHub references>'
$ git push --set-upstream origin your-branch-for-syncing
```

275
ISSUES.md Normal file
View File

@ -0,0 +1,275 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How To Request Support
This is an Open Source Project so please be mindful that like in any other project of this kind there is no obligation to answer all requests for help.
However, we want to encourage you to ask for help whenever you think it's needed! We are happy about every question we get because it allows us to better understand your needs, possible misunderstandings, and most importantly a way for you to help us make this library better. That being said, this document's main purpose is to provide guidelines at how you can formulate your requests to increase your chances to be understood and to get support.
There are two main venues to receive support: [the forums](https://discuss.huggingface.co/) and [the GitHub issues](https://github.com/huggingface/transformers/issues).
## The Forums
[The user forums](https://discuss.huggingface.co/) are supported by the wide community of the library users and backed up by developers when needed.
If you have a difficulty with deploying this library or some questions, or you'd like to discuss a new feature, please first consider discussing those things at the forums. Only when you feel your subject matter has been crystalized and you still need support from the library developers do proceed to file an [issue](https://github.com/huggingface/transformers/issues).
In particular all "Please explain" questions or objectively very user-specific feature requests belong to the forums. Here are some example of such questions:
* "I would like to use a BertModel within a RL-Agent for a customer support service. How can I use a BertForMaskedLM in my ChatBotModel?"
* "Could you please explain why T5 has no positional embedding matrix under T5Model?"
* "How should I set my generation parameters for translation?"
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
You are not required to read the following guidelines before opening an issue. However, if you notice that your issue doesn't get any replies, chances are that the developers have one or several difficulties with its quality. In this case, reading the following points and adjusting your issue accordingly could help.
1. Before posting an issue, first search for already posted issues, since chances are someone has already asked a similar question before you.
If you use Google your search query should be:
```
"huggingface" "transformers" your query
```
The first two quoted words tell Google to limit the search to the context of the Huggingface Transformers. The remainder is your query - most commonly this would be the error message the software fails with. We will go deeper into details shortly.
The results of such a query will typically match GitHub issues, Hugging Face forums, StackExchange, and blogs.
If you find relevant hints, you may choose to continue the discussion there if you have follow up questions.
If what you found is similar but doesn't quite answer your problem, please, post a new issue and do include links to similar issues or forum discussions you may have found.
Let's look at some examples:
The error message, often referred to as an assertion, tells us what went wrong. Here is an example of an assertion:
```python
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/transformers/src/transformers/__init__.py", line 34, in <module>
from . import dependency_versions_check
File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
from .file_utils import is_tokenizers_available
File "/transformers/src/transformers/file_utils.py", line 40, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
and it typically includes a traceback, so that we can see the full stack of calls the program made before it fails. This gives us the context to know why the program failed.
Going back to the above example. If you received this error search, look at the very last line of the error which is:
```python
ModuleNotFoundError: No module named 'tqdm.auto'
```
And now we can use it to do the searching on your favorite search engine:
1. first for `"huggingface" "transformers" "ModuleNotFoundError: No module named 'tqdm.auto'"`
2. if you don't find relevant results, then search for just `"ModuleNotFoundError: No module named 'tqdm.auto'"`
3. and finally if nothing still comes up, then remove the outside quotes: `ModuleNotFoundError: No module named 'tqdm.auto'`
If the error includes any messages that include bits unique to your filesystem, always remove those in the search query since other users will not have the same filesystem as yours. For example:
```bash
python -c 'open("/tmp/wrong_path.txt", "r")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/wrong_path.txt'
```
Here you'd search for just: `"FileNotFoundError: [Errno 2] No such file or directory"`
If the local information that you removed were inside the error message and you removed them you may need to remove double quotes since your query is no longer exact. So if the error message was something like:
```bash
ValueError: '/tmp/wrong_path.txt' cannot be found
```
then you'd search for `"ValueError" "cannot be found"`
As you search you will notice that when you don't use quotes often the search engines will return a variety of unrelated hits, which may or may not be what you want.
Experiment with different ways and find which approach gives the most satisfactory results.
2. Keep the issue short, providing the information that you think will aid the developers to understand your situation. Put yourself in the shoes of the person who has never seen your code or knows anything about your custom setup. This mental exercise will help to develop an intuition to what/what not to share"
3. If there is a software failure, always provide the full traceback, for example:
```python
$ python -c 'import transformers'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/transformers/src/transformers/__init__.py", line 34, in <module>
from . import dependency_versions_check
File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
from .file_utils import is_tokenizers_available
File "/transformers/src/transformers/file_utils.py", line 40, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
As compared to providing just the last line of the error message, e.g.:
```python
ModuleNotFoundError: No module named 'tqdm.auto'
```
which is not sufficient.
If your application is running on more than one GPU (e.g. under `DistributedDataParallel`) and typically getting every log and traceback printed multiple times, please make sure that you paste only one copy of it. At times the traceback from parallel processes may get interleaved - so either disentangle these or change the loggers to log only for `local_rank==0` so that only one process logs things.
4. When quoting a traceback, command line instructions and any type of code always enclose it in triple backticks inside the editor window, that is:
````
```
git clone https://github.com/huggingface/transformers
cd transformers
pip install .
```
````
If it's a command line with a long argument list, please consider breaking it down using backslashes and new lines. Here is an example of a good command line quote:
```bash
cd examples/seq2seq
python -m torch.distributed.launch --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \
--fp16 --sharded_ddp
```
If you don't break it up, one has to scroll horizontally which often makes it quite difficult to quickly see what's happening.
The backslashes allow us to copy the command directly into the console to run it, without needing to edit it.
5. Include only the important information that you think will help the developer to quickly identify the problem.
For example applications often create huge amounts of logs. Ask yourself whether providing all or parts of the log is useful.
Pasting a 100-1000 lines of log into the issue is an immediate turn off, since it will take a lot of time to figure out where the pertinent parts of the log are.
Attaching a full log can be helpful if it's done as an attachment, if it's enclosed in the following html code in the comment editor window:
```
<details>
<summary>Full log</summary>
<pre>
many
lines
go
here
</pre>
</details>
```
which would result in the following entry, which can be opened if desired, but otherwise takes little space.
<details>
<summary>Full log</summary>
<pre>
many
lines
go
here
</pre>
</details>
You could also provide a link to a pastebin service, but this is less beneficial since those links tend to expire quickly and future readers of your issue might not be able to access that log file anymore and may lack some context.
6. If this is an issue in your code, do try to reduce that code to a minimal example that still demonstrates the problem. Please ask at the forums if you have a hard time figuring how to do that. Please realize that we don't have the luxury of having time to try and understand all of your custom code.
If you really tried to make a short reproducible code but couldn't figure it out, it might be that having a traceback will give the developer enough information to know what's going on. But if it is not enough and we can't reproduce the problem, we can't really solve it.
Do not dispair if you can't figure it out from the begining, just share what you can and perhaps someone else will be able to help you at the forums.
7. If you forked off some of this project's code or example applications, please, do not ask us to go into your code repository and figure out what you may have done. The code is already very complex and unless there is an easy way to do a diff and it's a small diff, it won't be possible to find someone with time on their hands to make a lengthy investigation. Albeit, you might find someone at the forums who will be generous to do this for you.
8. Before reporting an issue, first, always try to update your environment to the latest official version of this library. We have no resources to go and debug older revisions, which could easily have bugs that have been fixed in the latest released version.
We understand that this is not always possible, especially when APIs change, in which case file an issue against the highest library version your environment can support.
Of course, if you upgrade the library, always retest that the problem is still there.
9. Please do not ask us to reproduce an issue with your custom data, since we don't have it. So, either you should use some existing dataset supported by HF datasets or you need to supply a code that generates a small sample on the fly, or some another quick and simple way to get it.
Please do not send us any non-public domain data that may require a license or a permission to be used.
10. Do not tag multiple developers on the issue unless you know this is expected, either because you asked them and they gave you an explicit permission to tag them or the issue template instructs you to do so.
The "who to tag for what domain" part of the issue template is there to help users direct their questions to the right developers who are designated maintainers of project's specific domains. They can then decide at their own discretion to tag other developers if they feel it'd help move the issue forward.
We currently don't have a triage service and we trust your capacity to identify the right domain and thus the persons to tag in your issue. If you are not sure, please use the forums to ask for guidance.
When in doubt, err on the side of not tagging a given person. If you tag multiple people out of context or permission don't be surprised if you get no response at all. Please remember that every time you tag someone, they get a notification and you're taking their time without their permission. Please be sensitive to that.
If you got helped by one of the developers in the past please don't tag them in future issues, unless they are listed in the issue template for the domain you are asking about or that developer gave you an explicit permission to tag them in future issues.
If you see a certain developer doing multiple and/or recent commits into a specific area of the project that you feel is relevant to your issue, it is not a good reason to tag them. Various developers may be fixing things that prevent them from moving forward, but often their work is focused on a totally different domain. And while they may or may not know how to help you with the problem at hand, it would benefit the whole community much more if they focus on the domain of their unique expertise.
11. Use the Edit button. Take your time, and re-read and improve the wording and formatting to make your posts and comments as easy to understand as possible.
Avoid posting multiple comments in a row, as each comment generates a notification for the developers tagged in that issue. If you happened to post multiple comments in a row, and nobody followed up yet - consider merging those into one or a few comments while editing the combined content to be coherent.
If you choose to edit your older comments after others posted follow up comments you need to be aware that your modifications might not be noticed, so if it's not a typo fixing, try to write a new comment flagging that something has been changed in the previous comments.
For example, the very first comment is the most important one. If while the thread unfolds you realize that things aren't as they seemed to you originally you may want to edit the first post to reflect the up-to-date understanding of the issue at hand so that it helps those who read your issue in the future quickly understand what's going on and not need to sift through dozens of comments. It also helps to indicate that the post was edited. So, those reading the thread later can understand why there might be certain discontinuity in the information flow.
Use bullets and items if you have lists of items and the outcome improves overall readability.
Use backticks to refer to class and function names, e.g. `BartModel` and `generate` as these stand out and improve the speed of a reader's comprehension.
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
For example the first link is a link to an issue, and the second to a specific comment in the same issue:
1. https://github.com/huggingface/transformers/issues/9257
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
```
> How big is your gpu cluster?
Our cluster is made of 256 gpus.
```
If you are addressing multiple comments, quote the relevant parts of each before your answer. Some people use the same comment to do multiple replies, others separate them into separate comments. Either way works. The latter approach helps for linking to a specific comment.
In general the best way to figure out what works the best is learn from issues posted by other people - see which issues get great responses and which get little to no response - observe what the posters who received great responses did differently from those who did not.
Thank you for reading this somewhat lengthy document. We would like to conclude that these are not absolute rules, but a friendly advice that will help maximize the chances for us to understand what you are trying to communicate, reproduce the problem then resolve it to your satisfaction and the benefit of the whole community.
If after reading this document there are remaining questions on how and why or there is a need for further elucidation, please, don't hesitate to ask your question in [this thread](https://discuss.huggingface.co/t/how-to-request-support/3128).

View File

@ -1,3 +1,4 @@
Copyright 2018- The Hugging Face team. All rights reserved.
Apache License
Version 2.0, January 2004

70
Makefile Normal file
View File

@ -0,0 +1,70 @@
.PHONY: deps_table_update modified_only_fixup extra_quality_checks quality style fixup fix-copies test test-examples docs
check_dirs := examples tests src utils
modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
black $(modified_py_files); \
isort $(modified_py_files); \
flake8 $(modified_py_files); \
else \
echo "No library .py files were modified"; \
fi
# Update src/transformers/dependency_versions_table.py
deps_table_update:
@python setup.py deps_table_update
# Check that source code meets quality standards
extra_quality_checks: deps_table_update
python utils/check_copies.py
python utils/check_table.py
python utils/check_dummies.py
python utils/check_repo.py
python utils/style_doc.py src/transformers docs/source --max_len 119
# this target runs checks on all files
quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
${MAKE} extra_quality_checks
# Format source code automatically and check is there are any problems left that need manual fixing
style: deps_table_update
black $(check_dirs)
isort $(check_dirs)
python utils/style_doc.py src/transformers docs/source --max_len 119
# Super fast fix and check target that only works on relevant modified files since the branch was made
fixup: modified_only_fixup extra_quality_checks
# Make marked copies of snippets of codes conform to the original
fix-copies:
python utils/check_copies.py --fix_and_overwrite
python utils/check_table.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
# Run tests for the library
test:
python -m pytest -n auto --dist=loadfile -s -v ./tests/
# Run tests for examples
test-examples:
python -m pytest -n auto --dist=loadfile -s -v ./examples/
# Check that docs can build
docs:
cd docs && make html SPHINXOPTS="-W -j 4"

701
README.md
View File

@ -1,10 +1,26 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<p align="center">
<br>
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/>
<br>
<p>
<p align="center">
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
<a href="https://circleci.com/gh/huggingface/transformers">
<img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/master">
</a>
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
@ -16,511 +32,236 @@
<a href="https://github.com/huggingface/transformers/releases">
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
</a>
<a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
</a>
</p>
<h3 align="center">
<p>State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
### Features
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
- As easy to use as pytorch-transformers
- As powerful and concise as Keras
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
State-of-the-art NLP for everyone
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
## Online demos
Lower compute costs, smaller carbon footprint
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) to use those models.
Choose the right framework for every part of a model's lifetime
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
Here are a few examples:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.
| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Model architectures](#model-architectures) | Architectures (with pretrained weights) |
| [Online demo](#online-demo) | Experimenting with this repos text generation capabilities |
| [Quick tour: Usage](#quick-tour) | Tokenizers & models usage: Bert and GPT-2 |
| [Quick tour: TF 2.0 and PyTorch ](#Quick-tour-TF-2.0-training-and-PyTorch-interoperability) | Train a TF 2.0 model in 10 lines of code, load it in PyTorch |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-of-the-fine-tuningusage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Migrating from pytorch-transformers to transformers](#Migrating-from-pytorch-transformers-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Migrating from pytorch-pretrained-bert to pytorch-transformers](#Migrating-from-pytorch-pretrained-bert-to-transformers) | Migrating your code from pytorch-pretrained-bert to transformers |
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and more |
## Quick tour
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
```python
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to include pipeline into the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9978193640708923}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
This is another example of pipeline used for that can extract question answers from some context:
``` python
>>> from transformers import pipeline
# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline have been included in the huggingface/transformers repository'
... })
{'score': 0.5135612454720828, 'start': 35, 'end': 59, 'answer': 'huggingface/transformers'}
```
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
```python
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
```
or for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
```
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
This repo is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.0.0+
### With pip
Transformers can be installed by pip as follows:
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
### From source
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
Clone the repository and run:
### With conda
```bash
pip install [--editable] .
Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
🤗 Transformers can be installed using conda as follows:
```shell script
conda install -c huggingface transformers
```
### Tests
A series of tests is included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/transformers/tree/master/transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/transformers/tree/master/examples).
These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
You can run the tests from the root of the cloned repository with the commands:
```bash
python -m pytest -sv ./transformers/tests/
python -m pytest -sv ./examples/
```
### Do you want to run a Transformer model on a mobile device?
You should check out our [`swift-coreml-transformers`](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains an example of a conversion script from a Pytorch trained Transformer model (here, `GPT-2`) to a CoreML model that runs on iOS devices.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!
## Model architectures
🤗 Transformers currently provides 8 NLU/NLG architectures:
1. **[BERT](https://github.com/google-research/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. **[GPT](https://github.com/openai/finetune-transformer-lm)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3. **[GPT-2](https://blog.openai.com/better-language-models/)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4. **[Transformer-XL](https://github.com/kimiyoung/transformer-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5. **[XLNet](https://github.com/zihangdai/xlnet/)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. **[XLM](https://github.com/facebookresearch/XLM/)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
7. **[RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. **[DistilBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation)** (from HuggingFace), released together with the blogpost [Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5
) by Victor Sanh, Lysandre Debut and Thomas Wolf.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Online demo
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team at transformer.huggingface.co, is the official demo of this repos text generation capabilities.
You can use it to experiment with completions generated by `GPT2Model`, `TransfoXLModel`, and `XLNetModel`.
> “🦄 Write with transformer is to writing what calculators are to calculus.”
![write_with_transformer](https://transformer.huggingface.co/front/assets/thumbnail-large.png)
## Quick tour
Let's do a very quick overview of the model architectures in 🤗 Transformers. Detailed examples for each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [full documentation](https://huggingface.co/transformers/).
```python
import torch
from transformers import *
# Transformers has a unified API
# for 8 transformer architectures and 30 pretrained weights.
# Model | Tokenizer | Pretrained weights shortcut
MODELS = [(BertModel, BertTokenizer, 'bert-base-uncased'),
(OpenAIGPTModel, OpenAIGPTTokenizer, 'openai-gpt'),
(GPT2Model, GPT2Tokenizer, 'gpt2'),
(TransfoXLModel, TransfoXLTokenizer, 'transfo-xl-wt103'),
(XLNetModel, XLNetTokenizer, 'xlnet-base-cased'),
(XLMModel, XLMTokenizer, 'xlm-mlm-enfr-1024'),
(DistilBertModel, DistilBertTokenizer, 'distilbert-base-uncased'),
(RobertaModel, RobertaTokenizer, 'roberta-base')]
# To use TensorFlow 2.0 versions of the models, simply prefix the class names with 'TF', e.g. `TFRobertaModel` is the TF 2.0 counterpart of the PyTorch model `RobertaModel`
# Let's encode some text in a sequence of hidden-states using each model:
for model_class, tokenizer_class, pretrained_weights in MODELS:
# Load pretrained model/tokenizer
tokenizer = tokenizer_class.from_pretrained(pretrained_weights)
model = model_class.from_pretrained(pretrained_weights)
# Encode text
input_ids = torch.tensor([tokenizer.encode("Here is some text to encode", add_special_tokens=True)]) # Add special tokens takes care of adding [CLS], [SEP], <s>... tokens in the right way for each model.
with torch.no_grad():
last_hidden_states = model(input_ids)[0] # Models outputs are now tuples
# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel, BertForPreTraining, BertForMaskedLM, BertForNextSentencePrediction,
BertForSequenceClassification, BertForMultipleChoice, BertForTokenClassification,
BertForQuestionAnswering]
# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
for model_class in BERT_MODEL_CLASSES:
# Load pretrained model/tokenizer
model = model_class.from_pretrained('bert-base-uncased')
# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights,
output_hidden_states=True,
output_attentions=True)
input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
all_hidden_states, all_attentions = model(input_ids)[-2:]
# Models are compatible with Torchscript
model = model_class.from_pretrained(pretrained_weights, torchscript=True)
traced_model = torch.jit.trace(model, (input_ids,))
# Simple serialization for models and tokenizers
model.save_pretrained('./directory/to/save/') # save
model = model_class.from_pretrained('./directory/to/save/') # re-load
tokenizer.save_pretrained('./directory/to/save/') # save
tokenizer = tokenizer_class.from_pretrained('./directory/to/save/') # re-load
# SOTA examples for GLUE, SQUAD, text generation...
```
## Quick tour TF 2.0 training and PyTorch interoperability
Let's do a quick example of how a TensorFlow 2.0 model can be trained in 12 lines of code with 🤗 Transformers and then loaded in PyTorch for fast inspection/tests.
```python
import tensorflow as tf
import tensorflow_datasets
from pytorch_transformers import *
# Load dataset, tokenizer, model from pretrained model/vocabulary
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-cased')
data = tensorflow_datasets.load('glue/mrpc')
# Prepare dataset for GLUE as a tf.data.Dataset instance
train_dataset = glue_convert_examples_to_features(data['train'], tokenizer, 128, 'mrpc')
valid_dataset = glue_convert_examples_to_features(data['validation'], tokenizer, 128, 'mrpc')
train_dataset = train_dataset.shuffle(100).batch(32).repeat(2)
valid_dataset = valid_dataset.batch(64)
# Prepare training: Compile tf.keras model with optimizer, loss and learning rate schedule
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
# Train and evaluate using tf.keras.Model.fit()
history = model.fit(train_dataset, epochs=2, steps_per_epoch=115,
validation_data=valid_dataset, validation_steps=7)
# Load the TensorFlow model in PyTorch for inspection
model.save_pretrained('./save/')
pytorch_model = BertForSequenceClassification.from_pretrained('./save/', from_tf=True)
# Quickly test a few predictions - MRPC is a paraphrasing task, let's see if our model learned the task
sentence_0 = "This research was consistent with his findings."
sentence_1 = "His findings were compatible with this research."
sentence_2 = "His findings were not compatible with this research."
inputs_1 = tokenizer.encode_plus(sentence_0, sentence_1, add_special_tokens=True, return_tensors='pt')
inputs_2 = tokenizer.encode_plus(sentence_0, sentence_2, add_special_tokens=True, return_tensors='pt')
pred_1 = pytorch_model(**inputs_1)[0].argmax().item()
pred_2 = pytorch_model(**inputs_2)[0].argmax().item()
print("sentence_1 is", "a paraphrase" if pred_1 else "not a paraphrase", "of sentence_0")
print("sentence_2 is", "a paraphrase" if pred_2 else "not a paraphrase", "of sentence_0")
```
## Quick tour of the fine-tuning/usage scripts
The library comprises several example scripts with SOTA performances for NLU and NLG tasks:
- `run_glue.py`: an example fine-tuning Bert, XLNet and XLM on nine different GLUE tasks (*sequence-level classification*)
- `run_squad.py`: an example fine-tuning Bert, XLNet and XLM on the question answering dataset SQuAD 2.0 (*token-level classification*)
- `run_generation.py`: an example using GPT, GPT-2, Transformer-XL and XLNet for conditional language generation
- other model-specific examples (see the documentation).
Here are three quick usage examples for these scripts:
### `run_glue.py`: Fine-tuning on GLUE tasks for sequence classification
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
Before running anyone of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
You should also install the additional packages required by the examples:
```shell
pip install -r ./examples/requirements.txt
```
```shell
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC
python ./examples/run_glue.py \
--model_type bert \
--model_name_or_path bert-base-uncased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--do_lower_case \
--data_dir $GLUE_DIR/$TASK_NAME \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/$TASK_NAME/
```
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
#### Fine-tuning XLNet model on the STS-B regression task
This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPUs (but is slower and less flexible than distributed training, see below).
```shell
export GLUE_DIR=/path/to/glue
python ./examples/run_glue.py \
--model_type xlnet \
--model_name_or_path xlnet-large-cased \
--do_train \
--do_eval \
--task_name=sts-b \
--data_dir=${GLUE_DIR}/STS-B \
--output_dir=./proc_data/sts-b-110 \
--max_seq_length=128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--gradient_accumulation_steps=1 \
--max_steps=1200 \
--model_name=xlnet-large-cased \
--overwrite_output_dir \
--overwrite_cache \
--warmup_steps=120
```
On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine. These hyper-parameters should result in a Pearson correlation coefficient of `+0.917` on the development set.
#### Fine-tuning Bert model on the MRPC classification task
This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
```bash
python -m torch.distributed.launch --nproc_per_node 8 ./examples/run_glue.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--task_name MRPC \
--do_train \
--do_eval \
--do_lower_case \
--data_dir $GLUE_DIR/MRPC/ \
--max_seq_length 128 \
--per_gpu_eval_batch_size=8 \
--per_gpu_train_batch_size=8 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/ \
--overwrite_output_dir \
--overwrite_cache \
```
Training with these hyper-parameters gave us the following results:
```bash
acc = 0.8823529411764706
acc_and_f1 = 0.901702786377709
eval_loss = 0.3418912578906332
f1 = 0.9210526315789473
global_step = 174
loss = 0.07231863956341798
```
### `run_squad.py`: Fine-tuning on SQuAD for question-answering
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
```bash
python -m torch.distributed.launch --nproc_per_node=8 ./examples/run_squad.py \
--model_type bert \
--model_name_or_path bert-large-uncased-whole-word-masking \
--do_train \
--do_eval \
--do_lower_case \
--train_file $SQUAD_DIR/train-v1.1.json \
--predict_file $SQUAD_DIR/dev-v1.1.json \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir ../models/wwm_uncased_finetuned_squad/ \
--per_gpu_eval_batch_size=3 \
--per_gpu_train_batch_size=3 \
```
Training with these hyper-parameters gave us the following results:
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
```
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
### `run_generation.py`: Text generation with GPT, GPT-2, Transformer-XL and XLNet
A conditional generation script is also included to generate text from a prompt.
The generation script includes the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by Aman Rusia to get high quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
Here is how to run the script with the small version of OpenAI GPT-2 model:
```shell
python ./examples/run_generation.py \
--model_type=gpt2 \
--length=20 \
--model_name_or_path=gpt2 \
```
## Migrating from pytorch-transformers to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to `transformers`.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert to transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`.
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
```
### Serialization
Breaking change in the `from_pretrained()`method:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*input` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute instead which can break derived model classes build based on the previous `BertForSequenceClassification` examples. We are working on a way to mitigate this breaking change in [#866](https://github.com/huggingface/transformers/pull/866) by forwarding the the model `__init__()` method (i) the provided positional arguments and (ii) the keyword arguments which do not match any configuration class attributes.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
Here is an example:
```python
### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]'])
model.resize_token_embeddings(len(tokenizer))
# Train our model
train(model)
### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/')
tokenizer.save_pretrained('./my_saved_model_directory/')
### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/')
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/')
```
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
```python
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_total_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_total_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_total_steps)
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step()
scheduler.step()
optimizer.zero_grad()
```
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
## Models architectures
**[All the model checkpoints](https://huggingface.co/models)** provided by 🤗 Transformers are seamlessly integrated from the huggingface.co [model hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and [organizations](https://huggingface.co/organizations).
Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen)
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MPNet](https://huggingface.co/transformers/model_doc/mpnet.html)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To check if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
## Citation
At the moment, there is no paper associated to Transformers but we are working on preparing one. In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.
We now have a [paper](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) you can cite for the 🤗 Transformers library:
```bibtex
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}
```

View File

@ -1,7 +0,0 @@
FROM pytorch/pytorch:latest
RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
RUN pip install transformers
WORKDIR /workspace

View File

@ -0,0 +1,26 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow-cpu \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,31 @@
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \
python3 setup.py install && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
torch
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,30 @@
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
torch
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \
python3 setup.py install && \
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,65 @@
FROM google/cloud-sdk:slim
# Build args.
ARG GITHUB_REF=refs/heads/master
# TODO: This Dockerfile installs pytorch/xla 3.6 wheels. There are also 3.7
# wheels available; see below.
ENV PYTHON_VERSION=3.6
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
curl \
ca-certificates
# Install conda and python.
# NOTE new Conda does not forward the exit status... https://github.com/conda/conda/issues/8385
RUN curl -o ~/miniconda.sh https://repo.anaconda.com/miniconda/Miniconda3-4.7.12-Linux-x86_64.sh && \
chmod +x ~/miniconda.sh && \
~/miniconda.sh -b && \
rm ~/miniconda.sh
ENV PATH=/root/miniconda3/bin:$PATH
RUN conda create -y --name container python=$PYTHON_VERSION
# Run the rest of commands within the new conda env.
# Use absolute path to appease Codefactor.
SHELL ["/root/miniconda3/bin/conda", "run", "-n", "container", "/bin/bash", "-c"]
RUN conda install -y python=$PYTHON_VERSION mkl
RUN pip uninstall -y torch && \
# Python 3.7 wheels are available. Replace cp36-cp36m with cp37-cp37m
gsutil cp 'gs://tpu-pytorch/wheels/torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
gsutil cp 'gs://tpu-pytorch/wheels/torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' . && \
pip install 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
pip install 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torch_xla-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
rm 'torchvision-nightly-cp${PYTHON_VERSION/./}-cp${PYTHON_VERSION/./}m-linux_x86_64.whl' && \
apt-get install -y libomp5
ENV LD_LIBRARY_PATH=root/miniconda3/envs/container/lib
# Install huggingface/transformers at the current PR, plus dependencies.
RUN git clone https://github.com/huggingface/transformers.git && \
cd transformers && \
git fetch origin $GITHUB_REF:CI && \
git checkout CI && \
cd .. && \
pip install ./transformers && \
pip install -r ./transformers/examples/requirements.txt && \
pip install pytest
RUN python -c "import torch_xla; print(torch_xla.__version__)"
RUN python -c "import transformers as trf; print(trf.__version__)"
RUN conda init bash
COPY docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD ["bash"]

View File

@ -0,0 +1,38 @@
local base = import 'templates/base.libsonnet';
local tpus = import 'templates/tpus.libsonnet';
local utils = import "templates/utils.libsonnet";
local volumes = import "templates/volumes.libsonnet";
local bertBaseCased = base.BaseTest {
frameworkPrefix: "hf",
modelName: "bert-base-cased",
mode: "example",
configMaps: [],
timeout: 3600, # 1 hour, in seconds
image: std.extVar('image'),
imageTag: std.extVar('image-tag'),
tpuSettings+: {
softwareVersion: "pytorch-nightly",
},
accelerator: tpus.v3_8,
volumeMap+: {
datasets: volumes.PersistentVolumeSpec {
name: "huggingface-cluster-disk",
mountPath: "/datasets",
},
},
command: utils.scriptCommand(
|||
python -m pytest -s transformers/examples/test_xla_examples.py -v
test_exit_code=$?
echo "\nFinished running commands.\n"
test $test_exit_code -eq 0
|||
),
};
bertBaseCased.oneshotJob

View File

@ -0,0 +1,32 @@
apiVersion: v1
kind: PersistentVolume
metadata:
name: huggingface-cluster-disk
spec:
storageClassName: ""
capacity:
storage: 500Gi
accessModes:
- ReadOnlyMany
claimRef:
namespace: default
name: huggingface-cluster-disk-claim
gcePersistentDisk:
pdName: huggingface-cluster-disk
fsType: ext4
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: huggingface-cluster-disk-claim
spec:
# Specify "" as the storageClassName so it matches the PersistentVolume's StorageClass.
# A nil storageClassName value uses the default StorageClass. For details, see
# https://kubernetes.io/docs/concepts/storage/persistent-volumes/#class-1
storageClassName: ""
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 1Ki

View File

@ -0,0 +1,8 @@
#!/bin/bash
source ~/.bashrc
echo "running docker-entrypoint.sh"
conda activate container
echo $KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS
echo "printed TPU info"
export XRT_TPU_CONFIG="tpu_worker;0;${KUBE_GOOGLE_CLOUD_TPU_ENDPOINTS:7}"
exec "$@"#!/bin/bash

View File

@ -0,0 +1,25 @@
FROM ubuntu:18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
tensorflow-cpu
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -0,0 +1,25 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
mkl \
tensorflow
WORKDIR /workspace
COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]

View File

@ -1,25 +1,49 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Generating the documentation
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
you can install them using:
you can install them with the following command, at the root of the code repository:
```bash
pip install -r requirements.txt
pip install -e ".[docs]"
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look like before committing for instance). You don't have to commit the built documentation.
---
## Packages installed
Here's an overview of all the packages installed. If you ran the previous command installing all packages from
Here's an overview of all the packages installed. If you ran the previous command installing all packages from
`requirements.txt`, you do not need to run the following commands.
Building it requires the package `sphinx` that you can
Building it requires the package `sphinx` that you can
install using:
```bash
pip install -U sphinx
```
You would also need the custom installed [theme](https://github.com/readthedocs/sphinx_rtd_theme) by
You would also need the custom installed [theme](https://github.com/readthedocs/sphinx_rtd_theme) by
[Read The Docs](https://readthedocs.org/). You can install it using the following command:
```bash
@ -34,23 +58,19 @@ pip install recommonmark
## Building the documentation
Make sure that there is a symlink from the `example` file (in /examples) inside the source folder. Run the followig
command to generate it:
```bash
ln -s ../../examples/README.md source/examples.md
```
Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder:
```bash
make html
```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
---
**NOTE**
If you are adding/removing elements from the toc-tree or from any strutural item, it is recommended to clean the build
If you are adding/removing elements from the toc-tree or from any structural item, it is recommended to clean the build
directory before rebuilding. Run the following command to clean and build:
```bash
@ -65,3 +85,176 @@ It should build the static app that will be available under `/docs/_build/html`
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it
in the source directory. You can then link it to the toc-tree by putting the filename without the extension.
## Preview the documentation in a pull request
Once you have made your pull request, you can check what the documentation will look like after it's merged by
following these steps:
- Look at the checks at the bottom of the conversation page of your PR (you may need to click on "show all checks" to
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
mostly written in ReStructuredText
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)).
### Adding a new tutorial
Adding a new tutorial or section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/index.rst` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or
four.
### Adding a new model
When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Link that file in `./source/index.rst` on the `model_doc` toc-tree.
- Write a short overview of the model:
- Overview with paper & authors
- Paper abstract
- Tips and tricks and how to use it best
- Add the classes that should be linked in the model. This generally includes the configuration, the tokenizer, and
every model of that class (the base model, alongside models with additional heads), both in PyTorch and TensorFlow.
The order is generally:
- Configuration,
- Tokenizer
- PyTorch base model
- PyTorch head models
- TensorFlow base model
- TensorFlow head models
These classes should be added using the RST syntax. Usually as follows:
```
XXXConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXConfig
:members:
```
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
```
XXXTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
```
### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings
should usually be put in `code`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`~transformers.XXXClass\`
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically
linked by Sphinx: :func:\`~transformers.function\`.
When mentioning a method, it is recommended to use the :meth: syntax as the mentioned method will be automatically
linked by Sphinx: :meth:\`~transformers.XXXClass.method\`.
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
#### Defining arguments in a method
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The argument should be followed by its type, with its shape if it is a tensor, and a line return.
Another indentation is necessary before writing the description of the argument.
Here's an example showcasing everything so far:
```
Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`~transformers.AlbertTokenizer`.
See :meth:`~transformers.PreTrainedTokenizer.encode` and
:meth:`~transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__
```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:
```
def my_function(x: str = None, a: float = 1):
```
then its documentation should look like this:
```
Args:
x (:obj:`str`, `optional`):
This argument controls ...
a (:obj:`float`, `optional`, defaults to 1):
This argument is used to ...
```
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
```
Example::
# first line of code
# second line
# etc
```
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test
the results stay consistent with the library.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.
Here's an example for tuple return, comprising several objects:
```
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
Here's an example for a single value return:
```
Returns:
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```

View File

@ -1,29 +0,0 @@
alabaster==0.7.12
Babel==2.7.0
certifi==2019.6.16
chardet==3.0.4
commonmark==0.9.0
docutils==0.14
future==0.17.1
idna==2.8
imagesize==1.1.0
Jinja2==2.10.1
MarkupSafe==1.1.1
packaging==19.0
Pygments==2.4.2
pyparsing==2.4.0
pytz==2019.1
recommonmark==0.5.0
requests==2.22.0
six==1.12.0
snowballstemmer==1.9.0
Sphinx==2.1.2
sphinx-rtd-theme==0.4.3
sphinxcontrib-applehelp==1.0.1
sphinxcontrib-devhelp==1.0.1
sphinxcontrib-htmlhelp==1.0.2
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.2
sphinxcontrib-serializinghtml==1.1.3
urllib3==1.25.3
sphinx-markdown-tables==0.0.9

View File

@ -9,4 +9,8 @@
.highlight .kn, .highlight .nv, .highlight .s2, .highlight .ow {
color: #6670FF;
}
.highlight .gp {
color: #FB8D68;
}

View File

@ -1,4 +1,111 @@
huggingface.css
/* Our DOM objects */
/* Colab dropdown */
table.center-aligned-table td {
text-align: center;
}
table.center-aligned-table th {
text-align: center;
vertical-align: middle;
}
.colab-dropdown {
position: relative;
display: inline-block;
}
.colab-dropdown-content {
display: none;
position: absolute;
background-color: #f9f9f9;
min-width: 117px;
box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
z-index: 1;
}
.colab-dropdown-content button {
color: #6670FF;
background-color: #f9f9f9;
font-size: 12px;
border: none;
min-width: 117px;
padding: 5px 5px;
text-decoration: none;
display: block;
}
.colab-dropdown-content button:hover {background-color: #eee;}
.colab-dropdown:hover .colab-dropdown-content {display: block;}
/* Version control */
.version-button {
background-color: #6670FF;
color: white;
border: none;
padding: 5px;
font-size: 15px;
cursor: pointer;
}
.version-button:hover, .version-button:focus {
background-color: #A6B0FF;
}
.version-dropdown {
display: none;
background-color: #6670FF;
min-width: 160px;
overflow: auto;
font-size: 15px;
}
.version-dropdown a {
color: white;
padding: 3px 4px;
text-decoration: none;
display: block;
}
.version-dropdown a:hover {
background-color: #A6B0FF;
}
.version-show {
display: block;
}
/* Framework selector */
.framework-selector {
display: flex;
flex-direction: row;
justify-content: flex-end;
margin-right: 30px;
}
.framework-selector > button {
background-color: white;
color: #6670FF;
border: 1px solid #6670FF;
padding: 5px;
}
.framework-selector > button.selected{
background-color: #6670FF;
color: white;
border: 1px solid #6670FF;
padding: 5px;
}
/* Copy button */
a.copybtn {
margin: 3px;
}
/* The literal code blocks */
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
@ -18,6 +125,7 @@ huggingface.css
/* The research field on top of the toc tree */
.wy-side-nav-search{
padding-top: 0;
background-color: #6670FF;
}
@ -26,6 +134,12 @@ huggingface.css
background-color: #6670FF;
}
/* The section headers in the toc tree */
.wy-menu-vertical p.caption{
background-color: #4d59ff;
line-height: 40px;
}
/* The selected items in the toc tree */
.wy-menu-vertical li.current{
background-color: #A6B0FF;
@ -44,11 +158,11 @@ huggingface.css
/* The text items on the toc tree */
.wy-menu-vertical a {
color: #FFFFDD;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
}
.wy-menu-vertical header, .wy-menu-vertical p.caption{
color: white;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
}
/* The color inside the selected toc tree block */
@ -85,7 +199,7 @@ a {
border-right: solid 2px #FB8D68;
border-left: solid 2px #FB8D68;
color: #FB8D68;
font-family: Calibre-Light;
font-family: Calibre-Light, sans-serif;
border-top: none;
font-style: normal !important;
}
@ -136,14 +250,14 @@ a {
/* class and method names in doc */
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) code.descclassname{
font-family: Calibre;
font-family: Calibre, sans-serif;
font-size: 20px !important;
}
/* class name in doc*/
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname{
margin-right: 10px;
font-family: Calibre-Medium;
font-family: Calibre-Medium, sans-serif;
}
/* Method and class parameters */
@ -160,17 +274,17 @@ a {
/* FONTS */
body{
font-family: Calibre;
font-family: Calibre, sans-serif;
font-size: 16px;
}
h1 {
font-family: Calibre-Thin;
font-family: Calibre-Thin, sans-serif;
font-size: 70px;
}
h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
font-family: Calibre-Medium;
font-family: Calibre-Medium, sans-serif;
}
@font-face {
@ -197,3 +311,40 @@ h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
font-weight:400;
}
/**
* Nav Links to other parts of huggingface.co
*/
div.menu {
position: absolute;
top: 0;
right: 0;
padding-top: 20px;
padding-right: 20px;
z-index: 1000;
}
div.menu a {
font-size: 14px;
letter-spacing: 0.3px;
text-transform: uppercase;
color: white;
-webkit-font-smoothing: antialiased;
background: linear-gradient(0deg, #6671ffb8, #9a66ffb8 50%);
padding: 10px 16px 6px 16px;
border-radius: 3px;
margin-left: 12px;
position: relative;
}
div.menu a:active {
top: 1px;
}
@media (min-width: 768px) and (max-width: 1750px) {
.wy-breadcrumbs {
margin-top: 32px;
}
}
@media (max-width: 768px) {
div.menu {
display: none;
}
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 14 KiB

After

Width:  |  Height:  |  Size: 7.6 KiB

361
docs/source/benchmarks.rst Normal file
View File

@ -0,0 +1,361 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Benchmarks
=======================================================================================================================
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found :prefix_link:`here
<notebooks/05-benchmark.ipynb>`.
How to benchmark 🤗 Transformer models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly
benchmark 🤗 Transformer models. The benchmark classes allow us to measure the `peak memory usage` and `required time`
for both `inference` and `training`.
.. note::
Hereby, `inference` is defined by a single forward pass, and `training` is defined by a single forward pass and
backward pass.
The benchmark classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` expect an
object of type :class:`~transformers.PyTorchBenchmarkArguments` and
:class:`~transformers.TensorFlowBenchmarkArguments`, respectively, for instantiation.
:class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` are data
classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it
is shown how a BERT model of type `bert-base-cased` can be benchmarked.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = PyTorchBenchmark(args)
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = TensorFlowBenchmark(args)
Here, three arguments are given to the benchmark argument data classes, namely ``models``, ``batch_sizes``, and
``sequence_lengths``. The argument ``models`` is required and expects a :obj:`list` of model identifiers from the
`model hub <https://huggingface.co/models>`__ The :obj:`list` arguments ``batch_sizes`` and ``sequence_lengths`` define
the size of the ``input_ids`` on which the model is benchmarked. There are many more parameters that can be configured
via the benchmark argument data classes. For more detail on these one can either directly consult the files
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch)
and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow). Alternatively, running the following shell
commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow
respectively.
.. code-block:: bash
## PYTORCH CODE
python examples/benchmarking/run_benchmark.py --help
## TENSORFLOW CODE
python examples/benchmarking/run_benchmark_tf.py --help
An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
.. code-block::
>>> ## PYTORCH CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.006
bert-base-uncased 8 32 0.006
bert-base-uncased 8 128 0.018
bert-base-uncased 8 512 0.088
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1227
bert-base-uncased 8 32 1281
bert-base-uncased 8 128 1307
bert-base-uncased 8 512 1539
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 08:58:43.371351
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> results = benchmark.run()
>>> print(results)
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base-uncased 8 8 0.005
bert-base-uncased 8 32 0.008
bert-base-uncased 8 128 0.022
bert-base-uncased 8 512 0.105
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base-uncased 8 8 1330
bert-base-uncased 8 32 1330
bert-base-uncased 8 128 1330
bert-base-uncased 8 512 1770
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:26:35.617317
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
By default, the `time` and the `required memory` for `inference` are benchmarked. In the example output above the first
two sections show the result corresponding to `inference time` and `inference memory`. In addition, all relevant
information about the computing environment, `e.g.` the GPU type, the system, the library versions, etc... are printed
out in the third section under `ENVIRONMENT INFORMATION`. This information can optionally be saved in a `.csv` file
when adding the argument :obj:`save_to_csv=True` to :class:`~transformers.PyTorchBenchmarkArguments` and
:class:`~transformers.TensorFlowBenchmarkArguments` respectively. In this case, every section is saved in a separate
`.csv` file. The path to each `.csv` file can optionally be defined via the argument data classes.
Instead of benchmarking pre-trained models via their model identifier, `e.g.` `bert-base-uncased`, the user can
alternatively benchmark an arbitrary configuration of any available model class. In this case, a :obj:`list` of
configurations must be inserted with the benchmark args as follows.
.. code-block::
>>> ## PYTORCH CODE
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 128 0.006
bert-base 8 512 0.006
bert-base 8 128 0.018
bert-base 8 512 0.088
bert-384-hid 8 8 0.006
bert-384-hid 8 32 0.006
bert-384-hid 8 128 0.011
bert-384-hid 8 512 0.054
bert-6-lay 8 8 0.003
bert-6-lay 8 32 0.004
bert-6-lay 8 128 0.009
bert-6-lay 8 512 0.044
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1277
bert-base 8 32 1281
bert-base 8 128 1307
bert-base 8 512 1539
bert-384-hid 8 8 1005
bert-384-hid 8 32 1027
bert-384-hid 8 128 1035
bert-384-hid 8 512 1255
bert-6-lay 8 8 1097
bert-6-lay 8 32 1101
bert-6-lay 8 128 1127
bert-6-lay 8 512 1359
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: PyTorch
- use_torchscript: False
- framework_version: 1.4.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:35:25.143267
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
>>> ## TENSORFLOW CODE
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6)
>>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
>>> benchmark.run()
==================== INFERENCE - SPEED - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Time in s
--------------------------------------------------------------------------------
bert-base 8 8 0.005
bert-base 8 32 0.008
bert-base 8 128 0.022
bert-base 8 512 0.106
bert-384-hid 8 8 0.005
bert-384-hid 8 32 0.007
bert-384-hid 8 128 0.018
bert-384-hid 8 512 0.064
bert-6-lay 8 8 0.002
bert-6-lay 8 32 0.003
bert-6-lay 8 128 0.0011
bert-6-lay 8 512 0.074
--------------------------------------------------------------------------------
==================== INFERENCE - MEMORY - RESULT ====================
--------------------------------------------------------------------------------
Model Name Batch Size Seq Length Memory in MB
--------------------------------------------------------------------------------
bert-base 8 8 1330
bert-base 8 32 1330
bert-base 8 128 1330
bert-base 8 512 1770
bert-384-hid 8 8 1330
bert-384-hid 8 32 1330
bert-384-hid 8 128 1330
bert-384-hid 8 512 1540
bert-6-lay 8 8 1330
bert-6-lay 8 32 1330
bert-6-lay 8 128 1330
bert-6-lay 8 512 1540
--------------------------------------------------------------------------------
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
Again, `inference time` and `required memory` for `inference` are measured, but this time for customized configurations
of the :obj:`BertModel` class. This feature can especially be helpful when deciding for which configuration the model
should be trained.
Benchmark best practices
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section lists a couple of best practices one should be aware of when benchmarking a model.
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
specifies on which device the code should be run by setting the ``CUDA_VISIBLE_DEVICES`` environment variable in the
shell, `e.g.` ``export CUDA_VISIBLE_DEVICES=0`` before running the code.
- The option :obj:`no_multi_processing` should only be set to :obj:`True` for testing and debugging. To ensure accurate
memory measurement it is recommended to run each memory benchmark in a separate process by making sure
:obj:`no_multi_processing` is set to :obj:`True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
useful for the community.
Sharing your benchmark
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
done across CPUs (except for TensorFlow XLA) and GPUs.
The approach is detailed in the `following blogpost
<https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2>`__ and the results are
available `here
<https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community
:prefix_link:`here <examples/benchmarking/README.md>`.

View File

@ -1,18 +1,38 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BERTology
---------
-----------------------------------------------------------------------------------------------------------------------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT (that some call "BERTology"). Some good examples of this field are:
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are:
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: https://arxiv.org/abs/1905.05950
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning: https://arxiv.org/abs/1906.04341
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to help people access the inner representations, mainly adapted from the great work of Paul Michel (https://arxiv.org/abs/1905.10650):
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
help people access the inner representations, mainly adapted from the great work of Paul Michel
(https://arxiv.org/abs/1905.10650):
* accessing all the hidden-states of BERT/GPT/GPT-2,
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained in https://arxiv.org/abs/1905.10650.
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: `bertology.py <https://github.com/huggingface/transformers/blob/master/examples/run_bertology.py>`_ while extract information and prune a model pre-trained on GLUE.
To help you understand and use these features, we have added a specific example script: :prefix_link:`bertology.py
<examples/research_projects/bertology/run_bertology.py>` while extract information and prune a model pre-trained on
GLUE.

View File

@ -14,20 +14,23 @@
#
import os
import sys
sys.path.insert(0, os.path.abspath('../..'))
sys.path.insert(0, os.path.abspath('../../src'))
# -- Project information -----------------------------------------------------
project = u'transformers'
copyright = u'2019, huggingface'
copyright = u'2020, The Hugging Face Team, Licenced under the Apache License, Version 2.0'
author = u'huggingface'
# The short X.Y version
version = u''
# The full version, including alpha/beta/rc tags
release = u'1.2.0'
release = u'4.2.0'
# Prefix link to point to master, comment this during version release and uncomment below line
extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/master/%s', '')}
# Prefix link to always point to corresponding version, uncomment this during version release
# extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/v'+ release + '/%s', '')}
# -- General configuration ---------------------------------------------------
@ -40,11 +43,13 @@ release = u'1.2.0'
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.extlinks',
'sphinx.ext.coverage',
'sphinx.ext.napoleon',
'recommonmark',
'sphinx.ext.viewcode',
'sphinx_markdown_tables'
'sphinx_markdown_tables',
'sphinx_copybutton'
]
# Add any paths that contain templates here, relative to this directory.
@ -74,6 +79,9 @@ exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
# Remove the prompt when copying examples
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True
# -- Options for HTML output -------------------------------------------------
@ -105,6 +113,12 @@ html_static_path = ['_static']
#
# html_sidebars = {}
# This must be the name of an image file (path relative to the configuration
# directory) that is the favicon of the docs. Modern browsers use this as
# the icon for tabs, windows and bookmarks. It should be a Windows-style
# icon file (.ico).
html_favicon = 'favicon.ico'
# -- Options for HTMLHelp output ---------------------------------------------
@ -181,8 +195,8 @@ epub_title = project
epub_exclude_files = ['search.html']
def setup(app):
app.add_stylesheet('css/huggingface.css')
app.add_stylesheet('css/code-snippets.css')
app.add_css_file('css/huggingface.css')
app.add_css_file('css/code-snippets.css')
app.add_js_file('js/custom.js')
# -- Extension configuration -------------------------------------------------

1
docs/source/contributing.md Symbolic link
View File

@ -0,0 +1 @@
../../CONTRIBUTING.md

View File

@ -1,18 +1,51 @@
Converting Tensorflow Checkpoints
================================================
..
Copyright 2020 The HuggingFace Team. All rights reserved.
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models than be loaded using the ``from_pretrained`` methods of the library.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Converting Tensorflow Checkpoints
=======================================================================================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models
than be loaded using the ``from_pretrained`` methods of the library.
.. note::
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google <https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the `convert_tf_checkpoint_to_pytorch.py <https://github.com/huggingface/transformers/blob/master/transformers/convert_tf_checkpoint_to_pytorch.py>`_ script.
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
<https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the
:prefix_link:`convert_bert_original_tf_checkpoint_to_pytorch.py
<src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ , `run_bert_classifier.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and `run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\ ).
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
can be imported using ``torch.load()`` (see examples in `run_bert_extract_features.py
<https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_extract_features.py>`_\ ,
`run_bert_classifier.py
<https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_classifier.py>`_ and
`run_bert_squad.py <https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/examples/run_bert_squad.py>`_\
).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\ ``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\
``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install tensorflow``\ ). The rest of the repository only requires PyTorch.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install
tensorflow``\ ). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
@ -20,75 +53,109 @@ Here is an example of the conversion process for a pre-trained ``BERT-Base Uncas
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers bert \
$BERT_BASE_DIR/bert_model.ckpt \
$BERT_BASE_DIR/bert_config.json \
$BERT_BASE_DIR/pytorch_model.bin
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here <https://github.com/google-research/bert#pre-trained-models>`__.
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
:prefix_link:`convert_albert_original_tf_checkpoint_to_pytorch.py
<src/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``\ ) and the accompanying
configuration file (\ ``albert_config.json``\ ), then creates and saves a PyTorch model. To run this conversion you
will need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
.. code-block:: shell
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\ )
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\
)
.. code-block:: shell
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers gpt \
$OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
$PYTORCH_DUMP_OUTPUT \
[OPENAI_GPT_CONFIG]
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
OpenAI GPT-2
^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here <https://github.com/openai/gpt-2>`__\ )
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here
<https://github.com/openai/gpt-2>`__\ )
.. code-block:: shell
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers gpt2 \
$OPENAI_GPT2_CHECKPOINT_PATH \
$PYTORCH_DUMP_OUTPUT \
[OPENAI_GPT2_CONFIG]
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here <https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers transfo_xl \
$TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
$PYTORCH_DUMP_OUTPUT \
[TRANSFO_XL_CONFIG]
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
XLNet
^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLNet model, fine-tuned on STS-B using the TensorFlow script:
Here is an example of the conversion process for a pre-trained XLNet model:
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers xlnet \
$TRANSFO_XL_CHECKPOINT_PATH \
$TRANSFO_XL_CONFIG_PATH \
$PYTORCH_DUMP_OUTPUT \
STS-B \
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
XLM
^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLM model:
@ -96,6 +163,8 @@ Here is an example of the conversion process for a pre-trained XLM model:
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers xlm \
$XLM_CHECKPOINT_PATH \
$PYTORCH_DUMP_OUTPUT \
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]

View File

@ -0,0 +1,729 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Fine-tuning with custom datasets
=======================================================================================================================
.. note::
The datasets used in this tutorial are available and can be more easily accessed using the `🤗 NLP library
<https://github.com/huggingface/nlp>`_. We do not use this library to access the datasets here since this tutorial
meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the tutorial
in the section ":ref:`nlplib`".
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The guide
shows one of many valid workflows for using these models and is meant to be illustrative rather than definitive. We
show examples of reading in several data formats, preprocessing the data for several types of tasks, and then preparing
the data into PyTorch/TensorFlow ``Dataset`` objects which can easily be used either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow.
We include several examples, each of which demonstrates a different type of common downstream task:
- :ref:`seq_imdb`
- :ref:`tok_ner`
- :ref:`qa_squad`
- :ref:`resources`
.. _seq_imdb:
Sequence Classification with IMDb Reviews
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and
can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("imdb")``.
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes
the text of a review and requires the model to predict whether the sentiment of the review is positive or negative.
Let's start by downloading the dataset from the `Large Movie Review Dataset
<http://ai.stanford.edu/~amaas/data/sentiment/>`_ webpage.
.. code-block:: bash
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar -xf aclImdb_v1.tar.gz
This data is organized into ``pos`` and ``neg`` folders with one text file per example. Let's write a function that can
read this in.
.. code-block:: python
from pathlib import Path
def read_imdb_split(split_dir):
split_dir = Path(split_dir)
texts = []
labels = []
for label_dir in ["pos", "neg"]:
for text_file in (split_dir/label_dir).iterdir():
texts.append(text_file.read_text())
labels.append(0 if label_dir is "neg" else 1)
return texts, labels
train_texts, train_labels = read_imdb_split('aclImdb/train')
test_texts, test_labels = read_imdb_split('aclImdb/test')
We now have a train and test dataset, but let's also also create a validation set which we can use for for evaluation
and tuning without training our test set results. Sklearn has a convenient utility for creating such splits:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
Alright, we've read in our dataset. Now let's tackle tokenization. We'll eventually train a classifier using
pre-trained DistilBert, so let's use the DistilBert tokenizer.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
Now we can simply pass our texts to the tokenizer. We'll pass ``truncation=True`` and ``padding=True``, which will
ensure that all of our sequences are padded to the same length and are truncated to be no longer model's maximum input
length. This will allow us to feed batches of sequences into the model at the same time.
.. code-block:: python
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
Now, let's turn our labels and encodings into a Dataset object. In PyTorch, this is done by subclassing a
``torch.utils.data.Dataset`` object and implementing ``__len__`` and ``__getitem__``. In TensorFlow, we pass our input
encodings and labels to the ``from_tensor_slices`` constructor method. We put the data in this format so that the data
can be easily batched such that each key in the batch encoding corresponds to a named parameter of the
:meth:`~transformers.DistilBertForSequenceClassification.forward` method of the model we will train.
.. code-block:: python
## PYTORCH CODE
import torch
class IMDbDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_dataset = IMDbDataset(train_encodings, train_labels)
val_dataset = IMDbDataset(val_encodings, val_labels)
test_dataset = IMDbDataset(test_encodings, test_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
dict(test_encodings),
test_labels
))
Now that our datasets our ready, we can fine-tune a model either with the 🤗
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow. See :doc:`training
<training>`.
.. _ft_trainer:
Fine-tuning with Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The steps above prepared the datasets in the way that the trainer is expected. Now all we need to do is create a model
to fine-tune, define the :class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` and
instantiate a :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`.
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = Trainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
with training_args.strategy.scope():
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
trainer = TFTrainer(
model=model, # the instantiated 🤗 Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
.. _ft_native:
Fine-tuning with native PyTorch/TensorFlow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We can also train use native PyTorch or TensorFlow:
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import DistilBertForSequenceClassification, AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
from transformers import TFDistilBertForSequenceClassification
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _tok_ner:
Token Classification with W-NUT Emerging Entities
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_),
and can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("wnut_17")``.
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
token. We'll demonstrate how to do this with `Named Entity Recognition
<http://nlpprogress.com/english/named_entity_recognition.html>`_, which involves identifying tokens which correspond to
a predefined set of "entities". Specifically, we'll use the `W-NUT Emerging and Rare entities
<http://noisy-text.github.io/2017/emerging-rare-entities.html>`_ corpus. The data is given as a collection of
pre-tokenized documents where each token is assigned a tag.
Let's start by downloading the data.
.. code-block:: bash
wget http://noisy-text.github.io/2017/files/wnut17train.conll
In this case, we'll just download the train set, which is a single text file. Each line of the file contains either (1)
a word and tag separated by a tab, or (2) a blank line indicating the end of a document. Let's write a function to read
this in. We'll take in the file path and return ``token_docs`` which is a list of lists of token strings, and
``token_tags`` which is a list of lists of tag strings.
.. code-block:: python
from pathlib import Path
import re
def read_wnut(file_path):
file_path = Path(file_path)
raw_text = file_path.read_text().strip()
raw_docs = re.split(r'\n\t?\n', raw_text)
token_docs = []
tag_docs = []
for doc in raw_docs:
tokens = []
tags = []
for line in doc.split('\n'):
token, tag = line.split('\t')
tokens.append(token)
tags.append(tag)
token_docs.append(tokens)
tag_docs.append(tags)
return token_docs, tag_docs
texts, tags = read_wnut('wnut17train.conll')
Just to see what this data looks like, let's take a look at a segment of the first document.
.. code-block:: python
>>> print(texts[0][10:17], tags[0][10:17], sep='\n')
['for', 'two', 'weeks', '.', 'Empire', 'State', 'Building']
['O', 'O', 'O', 'O', 'B-location', 'I-location', 'I-location']
``location`` is an entity type, ``B-`` indicates the beginning of an entity, and ``I-`` indicates consecutive positions
of the same entity ("Empire State Building" is considered one entity). ``O`` indicates the token does not correspond to
any entity.
Now that we've read the data in, let's create a train/validation split:
.. code-block:: python
from sklearn.model_selection import train_test_split
train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
Next, let's create encodings for our tokens and tags. For the tags, we can start by just create a simple mapping which
we'll use in a moment:
.. code-block:: python
unique_tags = set(tag for doc in tags for tag in doc)
tag2id = {tag: id for id, tag in enumerate(unique_tags)}
id2tag = {id: tag for tag, id in tag2id.items()}
To encode the tokens, we'll use a pre-trained DistilBert tokenizer. We can tell the tokenizer that we're dealing with
ready-split tokens rather than full sentence strings by passing ``is_split_into_words=True``. We'll also pass
``padding=True`` and ``truncation=True`` to pad the sequences to be the same length. Lastly, we can tell the model to
return information about the tokens which are split by the wordpiece tokenization process, which we will need in a
moment.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
Great, so now our tokens are nicely encoded in the format that they need to be in to feed them into our DistilBert
model below.
Now we arrive at a common obstacle with using pre-trained models for token-level classification: many of the tokens in
the W-NUT corpus are not in DistilBert's vocabulary. Bert and many models like it use a method called WordPiece
Tokenization, meaning that single words are split into multiple tokens such that each token is likely to be in the
vocabulary. For example, DistilBert's tokenizer would split the Twitter handle ``@huggingface`` into the tokens ``['@',
'hugging', '##face']``. This is a problem for us because we have exactly one tag per token. If the tokenizer splits a
token into multiple sub-tokens, then we will end up with a mismatch between our tokens and our labels.
One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in 🤗
Transformers by setting the labels we wish to ignore to ``-100``. In the example above, if the label for
``@HuggingFace`` is ``3`` (indexing ``B-corporation``), we would set the labels of ``['@', 'hugging', '##face']`` to
``[3, -100, -100]``.
Let's write a function to do this. This is where we will use the ``offset_mapping`` from the tokenizer as mentioned
above. For each sub-token returned by the tokenizer, the offset mapping gives us a tuple indicating the sub-token's
start position and end position relative to the original token it was split from. That means that if the first position
in the tuple is anything other than ``0``, we will set its corresponding label to ``-100``. While we're at it, we can
also set labels to ``-100`` if the second position of the offset mapping is ``0``, since this means it must be a
special token like ``[PAD]`` or ``[CLS]``.
.. note::
Due to a recently fixed bug, -1 must be used instead of -100 when using TensorFlow in 🤗 Transformers <= 3.02.
.. code-block:: python
import numpy as np
def encode_tags(tags, encodings):
labels = [[tag2id[tag] for tag in doc] for doc in tags]
encoded_labels = []
for doc_labels, doc_offset in zip(labels, encodings.offset_mapping):
# create an empty array of -100
doc_enc_labels = np.ones(len(doc_offset),dtype=int) * -100
arr_offset = np.array(doc_offset)
# set labels whose first offset position is 0 and the second is not 0
doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
encoded_labels.append(doc_enc_labels.tolist())
return encoded_labels
train_labels = encode_tags(train_tags, train_encodings)
val_labels = encode_tags(val_tags, val_encodings)
The hard part is now done. Just as in the sequence classification example above, we can create a dataset object:
.. code-block:: python
## PYTORCH CODE
import torch
class WNUTDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
return item
def __len__(self):
return len(self.labels)
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = WNUTDataset(train_encodings, train_labels)
val_dataset = WNUTDataset(val_encodings, val_labels)
## TENSORFLOW CODE
import tensorflow as tf
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_labels
))
Now load in a token classification model and specify the number of labels:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForTokenClassification
model = DistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
## TENSORFLOW CODE
from transformers import TFDistilBertForTokenClassification
model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
The data and model are both ready to go. You can train the model either with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow, exactly as in the
sequence classification example above.
- :ref:`ft_trainer`
- :ref:`ft_native`
.. _qa_squad:
Question Answering with SQuAD 2.0
-----------------------------------------------------------------------------------------------------------------------
.. note::
This dataset can be explored in the Hugging Face model hub (`SQuAD V2
<https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 NLP library with
``load_dataset("squad_v2")``.
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
involves answering a question about a passage by highlighting the segment of the passage that answers the question.
This involves fine-tuning a model which predicts a start position and an end position in the passage. We will use the
`Stanford Question Answering Dataset (SQuAD) 2.0 <https://rajpurkar.github.io/SQuAD-explorer/>`_.
We will start by downloading the data:
.. code-block:: bash
mkdir squad
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O squad/train-v2.0.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O squad/dev-v2.0.json
Each split is in a structured json file with a number of questions and answers for each passage (or context). We'll
take this apart into parallel lists of contexts, questions, and answers (note that the contexts here are repeated since
there are multiple questions per context):
.. code-block:: python
import json
from pathlib import Path
def read_squad(path):
path = Path(path)
with open(path, 'rb') as f:
squad_dict = json.load(f)
contexts = []
questions = []
answers = []
for group in squad_dict['data']:
for passage in group['paragraphs']:
context = passage['context']
for qa in passage['qas']:
question = qa['question']
for answer in qa['answers']:
contexts.append(context)
questions.append(question)
answers.append(answer)
return contexts, questions, answers
train_contexts, train_questions, train_answers = read_squad('squad/train-v2.0.json')
val_contexts, val_questions, val_answers = read_squad('squad/dev-v2.0.json')
The contexts and questions are just strings. The answers are dicts containing the subsequence of the passage with the
correct answer as well as an integer indicating the character at which the answer begins. In order to train a model on
this data we need (1) the tokenized context/question pairs, and (2) integers indicating at which *token* positions the
answer begins and ends.
First, let's get the *character* position at which the answer ends in the passage (we are given the starting position).
Sometimes SQuAD answers are off by one or two characters, so we will also adjust for that.
.. code-block:: python
def add_end_idx(answers, contexts):
for answer, context in zip(answers, contexts):
gold_text = answer['text']
start_idx = answer['answer_start']
end_idx = start_idx + len(gold_text)
# sometimes squad answers are off by a character or two fix this
if context[start_idx:end_idx] == gold_text:
answer['answer_end'] = end_idx
elif context[start_idx-1:end_idx-1] == gold_text:
answer['answer_start'] = start_idx - 1
answer['answer_end'] = end_idx - 1 # When the gold label is off by one character
elif context[start_idx-2:end_idx-2] == gold_text:
answer['answer_start'] = start_idx - 2
answer['answer_end'] = end_idx - 2 # When the gold label is off by two characters
add_end_idx(train_answers, train_contexts)
add_end_idx(val_answers, val_contexts)
Now ``train_answers`` and ``val_answers`` include the character end positions and the corrected start positions. Next,
let's tokenize our context/question pairs. 🤗 Tokenizers can accept parallel lists of sequences and encode them together
as sequence pairs.
.. code-block:: python
from transformers import DistilBertTokenizerFast
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
val_encodings = tokenizer(val_contexts, val_questions, truncation=True, padding=True)
Next we need to convert our character start/end positions to token start/end positions. When using 🤗 Fast Tokenizers,
we can use the built in :func:`~transformers.BatchEncoding.char_to_token` method.
.. code-block:: python
def add_token_positions(encodings, answers):
start_positions = []
end_positions = []
for i in range(len(answers)):
start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end']))
# if start position is None, the answer passage has been truncated
if start_positions[-1] is None:
start_positions[-1] = tokenizer.model_max_length
# if end position is None, the 'char_to_token' function points to the space before the correct token - > add + 1
if end_positions[-1] is None:
end_positions[-1] = encodings.char_to_token(i, answers[i]['answer_end'] + 1)
encodings.update({'start_positions': start_positions, 'end_positions': end_positions})
add_token_positions(train_encodings, train_answers)
add_token_positions(val_encodings, val_answers)
Our data is ready. Let's just put it in a PyTorch/TensorFlow dataset so that we can easily use it for training. In
PyTorch, we define a custom ``Dataset`` class. In TensorFlow, we pass a tuple of ``(inputs_dict, labels_dict)`` to the
``from_tensor_slices`` method.
.. code-block:: python
## PYTORCH CODE
import torch
class SquadDataset(torch.utils.data.Dataset):
def __init__(self, encodings):
self.encodings = encodings
def __getitem__(self, idx):
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
def __len__(self):
return len(self.encodings.input_ids)
train_dataset = SquadDataset(train_encodings)
val_dataset = SquadDataset(val_encodings)
## TENSORFLOW CODE
import tensorflow as tf
train_dataset = tf.data.Dataset.from_tensor_slices((
{key: train_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: train_encodings[key] for key in ['start_positions', 'end_positions']}
))
val_dataset = tf.data.Dataset.from_tensor_slices((
{key: val_encodings[key] for key in ['input_ids', 'attention_mask']},
{key: val_encodings[key] for key in ['start_positions', 'end_positions']}
))
Now we can use a DistilBert model with a QA head for training:
.. code-block:: python
## PYTORCH CODE
from transformers import DistilBertForQuestionAnswering
model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
## TENSORFLOW CODE
from transformers import TFDistilBertForQuestionAnswering
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
The data and model are both ready to go. You can train the model with
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` exactly as in the sequence classification example
above. If using native PyTorch, replace ``labels`` with ``start_positions`` and ``end_positions`` in the training
example. If using Keras's ``fit``, we need to make a minor modification to handle this example since it involves
multiple model outputs.
- :ref:`ft_trainer`
.. code-block:: python
## PYTORCH CODE
from torch.utils.data import DataLoader
from transformers import AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
start_positions = batch['start_positions'].to(device)
end_positions = batch['end_positions'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, start_positions=start_positions, end_positions=end_positions)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
## TENSORFLOW CODE
# Keras will expect a tuple when dealing with labels
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
# Keras will assign a separate loss for each output and add them together. So we'll just use the standard CE loss
# instead of using the built-in model.compute_loss, which expects a dict of outputs and averages the two terms.
# Note that this means the loss will be 2x of when using TFTrainer since we're adding instead of averaging them.
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.distilbert.return_dict = False # if using 🤗 Transformers >3.02, make sure outputs are tuples
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
.. _resources:
Additional Resources
-----------------------------------------------------------------------------------------------------------------------
- `How to train a new language model from scratch using Transformers and Tokenizers
<https://huggingface.co/blog/how-to-train>`_. Blog post showing the steps to load in Esperanto data and train a
masked language model from scratch.
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
- :doc:`Training <training>`. Docs page on training and fine-tuning.
.. _nlplib:
Using the 🤗 NLP Datasets & Metrics library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with 🤗
Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the `🤗
NLP library <https://github.com/huggingface/nlp>`_ for working with the 150+ datasets included in the `hub
<https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview, we
will show how to use the NLP library to download and prepare the IMDb dataset from the first example, :ref:`seq_imdb`.
Start by downloading the dataset:
.. code-block:: python
from nlp import load_dataset
train = load_dataset("imdb", split="train")
Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
.. code-block:: python
>>> print(train.column_names)
['label', 'text']
Great. Now let's tokenize the text. We can do this using the ``map`` method. We'll also rename the ``label`` column to
``labels`` to match the model's input arguments.
.. code-block:: python
train = train.map(lambda batch: tokenizer(batch["text"], truncation=True, padding=True), batched=True)
train.rename_column_("label", "labels")
Lastly, we can use the ``set_format`` method to determine which columns and in what data format we want to access
dataset elements.
.. code-block:: python
## PYTORCH CODE
>>> train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': torch.Size([]), 'input_ids': torch.Size([512]), 'attention_mask': torch.Size([512])}
## TENSORFLOW CODE
>>> train.set_format("tensorflow", columns=["input_ids", "attention_mask", "labels"])
>>> {key: val.shape for key, val in train[0].items()})
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
We now have a fully-prepared dataset. Check out `the 🤗 NLP docs <https://huggingface.co/nlp/processing.html>`_ for a
more thorough introduction.

1
docs/source/examples.md Symbolic link
View File

@ -0,0 +1 @@
../../examples/README.md

BIN
docs/source/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

299
docs/source/glossary.rst Normal file
View File

@ -0,0 +1,299 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Glossary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
General terms
-----------------------------------------------------------------------------------------------------------------------
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task that combines texts with another kind of inputs (for instance images).
- NLG: natural language generation, all tasks related to generating text ( for instance talk with transformers,
translation)
- NLP: natural language processing, a generic way to say "deal with texts".
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words)
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
Model inputs
-----------------------------------------------------------------------------------------------------------------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
.. _input-ids:
Input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
.. code-block::
>>> tokenized_sequence = tokenizer.tokenize(sequence)
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix
is added for "RA" and "M":
.. code-block::
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of `huggingface/tokenizers
<https://github.com/huggingface/tokenizers>`__ for peak performance.
.. code-block::
>>> inputs = tokenizer(sequence)
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
.. code-block::
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses.
If we decode the previous sequence of ids,
.. code-block::
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
we will see
.. code-block::
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
because this is the way a :class:`~transformers.BertModel` is going to expect its inputs.
.. _attention-mask:
Attention mask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the model
which tokens should be attended to, and which should not.
For example, consider these two sequences:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
The encoded versions have different lengths:
.. code-block::
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
.. code-block::
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
.. code-block::
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating the
position of the padded indices so that the model does not attend to them. For the :class:`~transformers.BertTokenizer`,
:obj:`1` indicates a value that should be attended to, while :obj:`0` indicates a padded value. This attention mask is
in the dictionary returned by the tokenizer under the key "attention_mask":
.. code-block::
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
.. _token-type-ids:
Token Type IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the
classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT model builds its two sequence input as
such:
.. code-block::
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two
arguments (and not a list, like before) like this:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
which will return:
.. code-block::
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying
the two types of sequence in the model.
The tokenizer returns this mask as the "token_type_ids" entry:
.. code-block::
>>> encoded_dict['token_type_ids']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The first sequence, the "context" used for the question, has all its tokens represented by a :obj:`0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a :obj:`1`.
Some models, like :class:`~transformers.XLNetModel` use an additional token represented by a :obj:`2`.
.. _position-ids:
Position IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in
the list of tokens.
They are an optional parameter. If no ``position_ids`` are passed to the model, the IDs are automatically created as
absolute positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
.. _labels:
Labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
predictions and the expected value (the label).
These labels are different according to the model head, for example:
- For sequence classification models (e.g., :class:`~transformers.BertForSequenceClassification`), the model expects a
tensor of dimension :obj:`(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence.
- For token classification models (e.g., :class:`~transformers.BertForTokenClassification`), the model expects a tensor
of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual
token.
- For masked language modeling (e.g., :class:`~transformers.BertForMaskedLM`), the model expects a tensor of dimension
:obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the
labels being the token ID for the masked token, and values to be ignored for the rest (usually -100).
- For sequence to sequence tasks,(e.g., :class:`~transformers.BartForConditionalGeneration`,
:class:`~transformers.MBartForConditionalGeneration`), the model expects a tensor of dimension :obj:`(batch_size,
tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During
training, both `BART` and `T5` will make the appropriate `decoder_input_ids` and decoder attention masks internally.
They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See
the documentation of each model for more information on each specific model's labels.
The base models (e.g., :class:`~transformers.BertModel`) do not accept labels, as these are the base transformer
models, simply outputting features.
.. _decoder-input-ids:
Decoder input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These
inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a
way specific to each model.
Most encoder-decoder models (BART, T5) create their :obj:`decoder_input_ids` on their own from the :obj:`labels`. In
such models, passing the :obj:`labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
.. _feed-forward-chunking:
Feed Forward Chunking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
``bert-base-uncased``).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with ``n =
sequence_length``, which trades increased computation time against reduced memory use, but yields a mathematically
**equivalent** result.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If ``chunk_size`` is set to 0, no feed forward chunking is done.

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 373 KiB

View File

@ -1,84 +1,419 @@
Transformers
================================================================================================================================================
=======================================================================================================================
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures
(BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation
(NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch.
State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
TensorFlow 2.0 and PyTorch.
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
Features
---------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
- As easy to use as pytorch-transformers
- As powerful and concise as Keras
- High performance on NLU and NLG tasks
- Low barrier to entry for educators and practitioners
State-of-the-art NLP for everyone
State-of-the-art NLP for everyone:
- Deep learning researchers
- Hands-on practitioners
- AI/ML/NLP teachers and educators
Lower compute costs, smaller carbon footprint
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining
- Practitioners can reduce compute time and production costs
- 8 architectures with over 30 pretrained models, some in more than 100 languages
Choose the right framework for every part of a model's lifetime
Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code
- Deep interoperability between TensorFlow 2.0 and PyTorch models
- Move a single model between TF2.0/PyTorch frameworks at will
- Seamlessly pick the right framework for training, evaluation, production
Experimental support for Flax with a few models right now, expected to grow in the coming months.
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
`organizations <https://huggingface.co/organizations>`__.
Current number of checkpoints: |checkpoints|
.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen
Contents
---------------------------------
-----------------------------------------------------------------------------------------------------------------------
The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
The documentation is organized in five parts:
1. `BERT <https://github.com/google-research/bert>`_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
2. `GPT <https://github.com/openai/finetune-transformer-lm>`_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training <https://blog.openai.com/language-unsupervised>`_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
3. `GPT-2 <https://blog.openai.com/better-language-models>`_ (from OpenAI) released with the paper `Language Models are Unsupervised Multitask Learners <https://blog.openai.com/better-language-models>`_ by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
4. `Transformer-XL <https://github.com/kimiyoung/transformer-xl>`_ (from Google/CMU) released with the paper `Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`_ by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
5. `XLNet <https://github.com/zihangdai/xlnet>`_ (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`_ by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
6. `XLM <https://github.com/facebookresearch/XLM>`_ (from Facebook) released together with the paper `Cross-lingual Language Model Pretraining <https://arxiv.org/abs/1901.07291>`_ by Guillaume Lample and Alexis Conneau.
7. `RoBERTa <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach <https://arxiv.org/abs/1907.11692>`_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
8. `DistilBERT <https://huggingface.co/transformers/model_doc/distilbert.html>`_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`_ by Victor Sanh, Lysandre Debut and Thomas Wolf.
- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
and a glossary.
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
transformers model
- The three last section contain the documentation of each public class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts
and conversion utilities for the following models:
..
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
Sharma, Radu Soricut.
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
Pre-training for Natural Language Generation, Translation, and Comprehension
<https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
Tixier, Michalis Vazirgiannis.
4. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
Kenton Lee and Kristina Toutanova.
5. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
Narayan, Aliaksei Severyn.
6. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
7. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
8. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
9. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
Lav R. Varshney, Caiming Xiong and Richard Socher.
10. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft Research) released with the paper `DeBERTa: Decoding-enhanced
BERT with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
Weizhu Chen.
11. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
12. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
version of DistilBERT.
13. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
14. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
15. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
16. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
17. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
and Ilya Sutskever.
18. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
Luan, Dario Amodei** and Ilya Sutskever**.
19. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
20. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
21. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
22. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
by Hao Tan and Mohit Bansal.
23. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
Translator Team.
24. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
25. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
Jianfeng Lu, Tie-Yan Liu.
26. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
27. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
Mohammad Saleh and Peter J. Liu.
28. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
29. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
30. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. ultilingual BERT into `DistilmBERT
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German version of
DistilBERT.
31. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
Krishna, and Kurt W. Keutzer.
32. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
33. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
Francesco Piccinno and Julian Martin Eisenschlos.
34. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
35. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
36. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
37. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
Zettlemoyer and Veselin Stoyanov.
38. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
.. _bigtable:
The table below represents the current support in the library for each of those models, whether they have a Python
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in PyTorch,
TensorFlow and/or Flax.
..
This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!
.. rst-class:: center-aligned-table
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
+=============================+================+================+=================+====================+==============+
| ALBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BART | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Blenderbot | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| BlenderbotSmall | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DeBERTa | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Encoder decoder | ❌ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| LayoutLM | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Marian | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Pegasus | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RAG | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| T5 | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| TAPAS | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| mBART | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
| mT5 | ✅ | ✅ | ✅ | ✅ | ❌ |
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
.. toctree::
:maxdepth: 2
:caption: Notes
:caption: Get started
quicktour
installation
quickstart
philosophy
glossary
.. toctree::
:maxdepth: 2
:caption: Using 🤗 Transformers
task_summary
model_summary
preprocessing
training
model_sharing
tokenizer_summary
multilingual
.. toctree::
:maxdepth: 2
:caption: Advanced guides
pretrained_models
examples
custom_datasets
notebooks
serialization
converting_tensorflow_models
migration
contributing
testing
serialization
.. toctree::
:maxdepth: 2
:caption: Research
bertology
torchscript
perplexity
benchmarks
.. toctree::
:maxdepth: 2
:caption: Main classes
:caption: Main Classes
main_classes/callback
main_classes/configuration
main_classes/logging
main_classes/model
main_classes/tokenizer
main_classes/optimizer_schedules
main_classes/output
main_classes/pipelines
main_classes/processors
main_classes/tokenizer
main_classes/trainer
.. toctree::
:maxdepth: 2
:caption: Package Reference
:caption: Models
model_doc/albert
model_doc/auto
model_doc/bart
model_doc/barthez
model_doc/bert
model_doc/gpt
model_doc/transformerxl
model_doc/gpt2
model_doc/xlm
model_doc/xlnet
model_doc/roberta
model_doc/bertweet
model_doc/bertgeneration
model_doc/blenderbot
model_doc/blenderbot_small
model_doc/camembert
model_doc/ctrl
model_doc/deberta
model_doc/dialogpt
model_doc/distilbert
model_doc/dpr
model_doc/electra
model_doc/encoderdecoder
model_doc/flaubert
model_doc/fsmt
model_doc/funnel
model_doc/herbert
model_doc/layoutlm
model_doc/led
model_doc/longformer
model_doc/lxmert
model_doc/marian
model_doc/mbart
model_doc/mobilebert
model_doc/mpnet
model_doc/mt5
model_doc/gpt
model_doc/gpt2
model_doc/pegasus
model_doc/phobert
model_doc/prophetnet
model_doc/rag
model_doc/reformer
model_doc/retribert
model_doc/roberta
model_doc/squeezebert
model_doc/t5
model_doc/tapas
model_doc/transformerxl
model_doc/xlm
model_doc/xlmprophetnet
model_doc/xlmroberta
model_doc/xlnet
.. toctree::
:maxdepth: 2
:caption: Internal Helpers
internal/modeling_utils
internal/pipelines_utils
internal/tokenization_utils
internal/trainer_utils
internal/generation_utils

138
docs/source/installation.md Normal file
View File

@ -0,0 +1,138 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available),
[PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or
[Flax installation page](https://github.com/google/flax#quick-install)
regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with:
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with:
```bash
pip install transformers[tf-cpu]
```
or 🤗 Transformers and Flax in one line with:
```bash
pip install transformers[flax]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
It should download a pretrained model then print something like
```bash
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```
(Note that TensorFlow will print additional stuff before that last statement.)
## Installing from source
To install from source, clone the repository and install with the following commands:
``` bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
```
Again, you can run
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
to check 🤗 Transformers is properly installed.
## With conda
Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
🤗 Transformers can be installed using conda as follows:
```
conda install -c huggingface transformers
```
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
## Caching models
This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with
`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the Hugging
Face cache home followed by ``/transformers/``. This is (by order of priority):
* shell environment variable ``HF_HOME``
* shell environment variable ``XDG_CACHE_HOME`` + ``/huggingface/``
* default: ``~/.cache/huggingface/``
So if you don't have any specific environment variable set, the cache directory will be at
``~/.cache/huggingface/transformers/``.
**Note:** If you have set a shell environment variable for one of the predecessors of this library
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
environment variable for ``TRANSFORMERS_CACHE``.
### Note on model downloads (Continuous Integration or large-scale deployments)
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through
your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way
faster, and cheaper. Feel free to contact us privately if you need any help.
## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pretraining or fine-tuning models in PyTorch or
TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!

View File

@ -1,71 +0,0 @@
Installation
================================================
Transformers is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 1.1.0
With pip
^^^^^^^^
PyTorch Transformers can be installed using pip as follows:
.. code-block:: bash
pip install transformers
From source
^^^^^^^^^^^
To install from source, clone the repository and install with:
.. code-block:: bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install [--editable] .
Tests
^^^^^
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in the `tests folder <https://github.com/huggingface/transformers/tree/master/transformers/tests>`_ and examples tests in the `examples folder <https://github.com/huggingface/transformers/tree/master/examples>`_.
Tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
Run all the tests from the root of the cloned repository with the commands:
.. code-block:: bash
python -m pytest -sv ./transformers/tests/
python -m pytest -sv ./examples/
OpenAI GPT original tokenization workflow
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you want to reproduce the original tokenization process of the ``OpenAI GPT`` paper, you will need to install ``ftfy`` (use version 4.4.3 if you are using Python 2) and ``SpaCy`` :
.. code-block:: bash
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``\ , the ``OpenAI GPT`` tokenizer will default to tokenize using BERT's ``BasicTokenizer`` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
Note on model downloads (Continuous Integration or large-scale deployments)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you expect to be downloading large volumes of models (more than 1,000) from our hosted bucket (for instance through your CI setup, or a large-scale production deployment), please cache the model files on your end. It will be way faster, and cheaper. Feel free to contact us privately if you need any help.
Do you want to run a Transformer model on a mobile device?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You should check out our `swift-coreml-transformers <https://github.com/huggingface/swift-coreml-transformers>`_ repo.
It contains an example of a conversion script from a Pytorch trained Transformer model (here, ``GPT-2``) to a CoreML model that runs on iOS devices.
It also contains an implementation of BERT for Question answering.
At some point in the future, you'll be able to seamlessly move from pre-training or fine-tuning models in PyTorch to productizing them in CoreML,
or prototype a model or an app in CoreML then research its hyperparameters or architecture from PyTorch. Super exciting!

View File

@ -0,0 +1,168 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Generation
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :meth:`~transformers.PreTrainedModel.generate`,
:meth:`~transformers.PreTrainedModel.greedy_search`, :meth:`~transformers.PreTrainedModel.sample`,
:meth:`~transformers.PreTrainedModel.beam_search`, :meth:`~transformers.PreTrainedModel.beam_sample`, and
:meth:`~transformers.PreTrainedModel.group_beam_search`.
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The output of :meth:`~transformers.PreTrainedModel.generate` is an instance of a subclass of
:class:`~transformers.file_utils.ModelOutput`. This output is a data structure containing all the information returned
by :meth:`~transformers.PreTrainedModel.generate`, but that can also be used as tuple or dictionary.
Here's an example:
.. code-block::
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The ``generation_output`` object is a :class:`~transformers.generation_utils.GreedySearchDecoderOnlyOutput`, as we can
see in the documentation of that class below, it means it has the following attributes:
- ``sequences``: the generated sequences of tokens
- ``scores`` (optional): the prediction scores of the language modelling head, for each generation step
- ``hidden_states`` (optional): the hidden states of the model, for each generation step
- ``attentions`` (optional): the attention weights of the model, for each generation step
Here we have the ``scores`` since we passed along ``output_scores=True``, but we don't have ``hidden_states`` and
``attentions`` because we didn't pass ``output_hidden_states=True`` or ``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``generation_output.scores`` are all the generated prediction scores of the
language modeling head, and ``generation_output.attentions`` is ``None``.
When using our ``generation_output`` object as a tuple, it only keeps the attributes that don't have ``None`` values.
Here, for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
generation_output[:2]
will return the tuple ``(generation_output.sequences, generation_output.scores)`` for instance.
When using our ``generation_output`` object as a dictionary, it only keeps the attributes that don't have ``None``
values. Here, for instance, it has two keys that are ``sequences`` and ``scores``.
We document here all output types.
GreedySearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.GreedySearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.GreedySearchEncoderDecoderOutput
:members:
SampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.SampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.SampleEncoderDecoderOutput
:members:
BeamSearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSearchEncoderDecoderOutput
:members:
BeamSampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSampleEncoderDecoderOutput
:members:
LogitsProcessor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.LogitsProcessor` can be used to modify the prediction scores of a language model head for
generation.
.. autoclass:: transformers.LogitsProcessor
:members: __call__
.. autoclass:: transformers.LogitsProcessorList
:members: __call__
.. autoclass:: transformers.LogitsWarper
:members: __call__
.. autoclass:: transformers.MinLengthLogitsProcessor
:members: __call__
.. autoclass:: transformers.TemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.RepetitionPenaltyLogitsProcessor
:members: __call__
.. autoclass:: transformers.TopPLogitsWarper
:members: __call__
.. autoclass:: transformers.TopKLogitsWarper
:members: __call__
.. autoclass:: transformers.NoRepeatNGramLogitsProcessor
:members: __call__
.. autoclass:: transformers.NoBadWordsLogitsProcessor
:members: __call__
.. autoclass:: transformers.PrefixConstrainedLogitsProcessor
:members: __call__
.. autoclass:: transformers.HammingDiversityLogitsProcessor
:members: __call__
BeamSearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BeamScorer
:members: process, finalize
.. autoclass:: transformers.BeamSearchScorer
:members: process, finalize
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.top_k_top_p_filtering
.. autofunction:: transformers.tf_top_k_top_p_filtering

View File

@ -0,0 +1,98 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.Conv1D
.. autoclass:: transformers.modeling_utils.PoolerStartLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerEndLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
:members: forward
.. autoclass:: transformers.modeling_utils.SquadHeadOutput
.. autoclass:: transformers.modeling_utils.SQuADHead
:members: forward
.. autoclass:: transformers.modeling_utils.SequenceSummary
:members: forward
PyTorch Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
.. autofunction:: transformers.modeling_utils.prune_layer
.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
.. autofunction:: transformers.modeling_utils.prune_linear_layer
TensorFlow custom layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFConv1D
.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
:members: call
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
:members: call
TensorFlow loss functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
:members:
TensorFlow Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.modeling_tf_utils.get_initializer
.. autofunction:: transformers.modeling_tf_utils.keras_serializable
.. autofunction:: transformers.modeling_tf_utils.shape_list

View File

@ -0,0 +1,52 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.pipelines.get_framework
.. autoclass:: transformers.pipelines.PipelineException

View File

@ -0,0 +1,51 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Tokenizers
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by the tokenizers, mainly the class
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that implements the common methods between
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` and the mixin
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
Most of those are only useful if you are studying the code of the tokenizers in the library.
PreTrainedTokenizerBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.PreTrainedTokenizerBase
:special-members: __call__
:members:
SpecialTokensMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.SpecialTokensMixin
:members:
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.ExplicitEnum
.. autoclass:: transformers.tokenization_utils_base.PaddingStrategy
.. autoclass:: transformers.tokenization_utils_base.TensorType
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
.. autoclass:: transformers.tokenization_utils_base.CharSpan
.. autoclass:: transformers.tokenization_utils_base.TokenSpan

View File

@ -0,0 +1,48 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Trainer
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :class:`~transformers.Trainer`.
Most of those are only useful if you are studying the code of the Trainer in the library.
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EvalPrediction
.. autoclass:: transformers.EvaluationStrategy
.. autofunction:: transformers.set_seed
.. autofunction:: transformers.torch_distributed_zero_first
Callbacks internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_callback.CallbackHandler
Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_pt_utils.DistributedTensorGatherer
:members:
Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HfArgumentParser

View File

@ -0,0 +1,89 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Callbacks
-----------------------------------------------------------------------------------------------------------------------
Callbacks are objects that can customize the behavior of the training loop in the PyTorch
:class:`~transformers.Trainer` (this feature is not yet implemented in TensorFlow) that can inspect the training loop
state (for progress reporting, logging on TensorBoard or other ML platforms...) and take decisions (like early
stopping).
Callbacks are "read only" pieces of code, apart from the :class:`~transformers.TrainerControl` object they return, they
cannot change anything in the training loop. For customizations that require changes in the training loop, you should
subclass :class:`~transformers.Trainer` and override the methods you need (see :doc:`trainer` for examples).
By default a :class:`~transformers.Trainer` will use the following callbacks:
- :class:`~transformers.DefaultFlowCallback` which handles the default behavior for logging, saving and evaluation.
- :class:`~transformers.PrinterCallback` or :class:`~transformers.ProgressCallback` to display progress and print the
logs (the first one is used if you deactivate tqdm through the :class:`~transformers.TrainingArguments`, otherwise
it's the second one).
- :class:`~transformers.integrations.TensorBoardCallback` if tensorboard is accessible (either through PyTorch >= 1.4
or tensorboardX).
- :class:`~transformers.integrations.WandbCallback` if `wandb <https://www.wandb.com/>`__ is installed.
- :class:`~transformers.integrations.CometCallback` if `comet_ml <https://www.comet.ml/site/>`__ is installed.
- :class:`~transformers.integrations.MLflowCallback` if `mlflow <https://www.mlflow.org/>`__ is installed.
- :class:`~transformers.integrations.AzureMLCallback` if `azureml-sdk <https://pypi.org/project/azureml-sdk/>`__ is
installed.
The main class that implements callbacks is :class:`~transformers.TrainerCallback`. It gets the
:class:`~transformers.TrainingArguments` used to instantiate the :class:`~transformers.Trainer`, can access that
Trainer's internal state via :class:`~transformers.TrainerState`, and can take some actions on the training loop via
:class:`~transformers.TrainerControl`.
Available Callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is the list of the available :class:`~transformers.TrainerCallback` in the library:
.. autoclass:: transformers.integrations.CometCallback
:members: setup
.. autoclass:: transformers.DefaultFlowCallback
.. autoclass:: transformers.PrinterCallback
.. autoclass:: transformers.ProgressCallback
.. autoclass:: transformers.EarlyStoppingCallback
.. autoclass:: transformers.integrations.TensorBoardCallback
.. autoclass:: transformers.integrations.WandbCallback
:members: setup
.. autoclass:: transformers.integrations.MLflowCallback
:members: setup
.. autoclass:: transformers.integrations.AzureMLCallback
TrainerCallback
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerCallback
:members:
TrainerState
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerState
:members:
TrainerControl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerControl
:members:

View File

@ -1,10 +1,25 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Configuration
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PretrainedConfig`` implements the common methods for loading/saving a configuration either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base class :class:`~transformers.PretrainedConfig` implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
from HuggingFace's AWS S3 repository).
``PretrainedConfig``
~~~~~~~~~~~~~~~~~~~~~
PretrainedConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PretrainedConfig
:members:

View File

@ -0,0 +1,70 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Logging
-----------------------------------------------------------------------------------------------------------------------
🤗 Transformers has a centralized logging system, so that you can setup the verbosity of the library easily.
Currently the default verbosity of the library is ``WARNING``.
To change the level of verbosity, just use one of the direct setters. For instance, here is how to change the verbosity
to the INFO level.
.. code-block:: python
import transformers
transformers.logging.set_verbosity_info()
You can also use the environment variable ``TRANSFORMERS_VERBOSITY`` to override the default verbosity. You can set it
to one of the following: ``debug``, ``info``, ``warning``, ``error``, ``critical``. For example:
.. code-block:: bash
TRANSFORMERS_VERBOSITY=error ./myprogram.py
All the methods of this logging module are documented below, the main ones are
:func:`transformers.logging.get_verbosity` to get the current level of verbosity in the logger and
:func:`transformers.logging.set_verbosity` to set the verbosity to the level of your choice. In order (from the least
verbose to the most verbose), those levels (with their corresponding int values in parenthesis) are:
- :obj:`transformers.logging.CRITICAL` or :obj:`transformers.logging.FATAL` (int value, 50): only report the most
critical errors.
- :obj:`transformers.logging.ERROR` (int value, 40): only report errors.
- :obj:`transformers.logging.WARNING` or :obj:`transformers.logging.WARN` (int value, 30): only reports error and
warnings. This the default level used by the library.
- :obj:`transformers.logging.INFO` (int value, 20): reports error, warnings and basic information.
- :obj:`transformers.logging.DEBUG` (int value, 10): report all information.
Base setters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.set_verbosity_error
.. autofunction:: transformers.logging.set_verbosity_warning
.. autofunction:: transformers.logging.set_verbosity_info
.. autofunction:: transformers.logging.set_verbosity_debug
Other functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.logging.get_verbosity
.. autofunction:: transformers.logging.set_verbosity
.. autofunction:: transformers.logging.get_logger
.. autofunction:: transformers.logging.enable_explicit_format
.. autofunction:: transformers.logging.reset_format

View File

@ -1,21 +1,75 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Models
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PreTrainedModel`` implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS S3 repository).
The base classes :class:`~transformers.PreTrainedModel`, :class:`~transformers.TFPreTrainedModel`, and
:class:`~transformers.FlaxPreTrainedModel` implement the common methods for loading/saving a model either from a local
file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's AWS
S3 repository).
``PreTrainedModel`` also implements a few methods which are common among all the models to:
:class:`~transformers.PreTrainedModel` and :class:`~transformers.TFPreTrainedModel` also implement a few methods which
are common among all the models to:
- resize the input token embeddings when new tokens are added to the vocabulary
- prune the attention heads of the model.
``PreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
The other methods that are common to each model are defined in :class:`~transformers.modeling_utils.ModuleUtilsMixin`
(for the PyTorch models) and :class:`~transformers.modeling_tf_utils.TFModuleUtilsMixin` (for the TensorFlow models) or
for text generation, :class:`~transformers.generation_utils.GenerationMixin` (for the PyTorch models) and
:class:`~transformers.generation_tf_utils.TFGenerationMixin` (for the TensorFlow models)
PreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedModel
:members:
``TFPreTrainedModel``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFPreTrainedModel
ModuleUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.ModuleUtilsMixin
:members:
TFPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFPreTrainedModel
:members:
TFModelUtilsMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin
:members:
FlaxPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxPreTrainedModel
:members:
Generation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.generation_utils.GenerationMixin
:members:
.. autoclass:: transformers.generation_tf_utils.TFGenerationMixin
:members:

View File

@ -1,45 +1,70 @@
Optimizer
----------------------------------------------------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Optimization
-----------------------------------------------------------------------------------------------------------------------
The ``.optimization`` module provides:
- an optimizer with weight decay fixed that can be used to fine-tuned models, and
- several schedules in the form of schedule objects that inherit from ``_LRSchedule``:
- a gradient accumulation class to accumulate the gradients of multiple batches
``AdamW``
~~~~~~~~~~~~~~~~
AdamW (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamW
:members:
AdaFactor (PyTorch)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Adafactor
AdamWeightDecay (TensorFlow)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AdamWeightDecay
.. autofunction:: transformers.create_optimizer
Schedules
----------------------------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Learning Rate Schedules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Learning Rate Schedules (Pytorch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.ConstantLRSchedule
:members:
.. autoclass:: transformers.SchedulerType
.. autofunction:: transformers.get_scheduler
.. autofunction:: transformers.get_constant_schedule
.. autoclass:: transformers.WarmupConstantSchedule
:members:
.. autofunction:: transformers.get_constant_schedule_with_warmup
.. image:: /imgs/warmup_constant_schedule.png
:target: /imgs/warmup_constant_schedule.png
:alt:
.. autoclass:: transformers.WarmupCosineSchedule
:members:
.. autofunction:: transformers.get_cosine_schedule_with_warmup
.. image:: /imgs/warmup_cosine_schedule.png
:target: /imgs/warmup_cosine_schedule.png
:alt:
.. autoclass:: transformers.WarmupCosineWithHardRestartsSchedule
:members:
.. autofunction:: transformers.get_cosine_with_hard_restarts_schedule_with_warmup
.. image:: /imgs/warmup_cosine_hard_restarts_schedule.png
:target: /imgs/warmup_cosine_hard_restarts_schedule.png
@ -47,9 +72,26 @@ Learning Rate Schedules
.. autoclass:: transformers.WarmupLinearSchedule
:members:
.. autofunction:: transformers.get_linear_schedule_with_warmup
.. image:: /imgs/warmup_linear_schedule.png
:target: /imgs/warmup_linear_schedule.png
:alt:
.. autofunction:: transformers.get_polynomial_decay_schedule_with_warmup
Warmup (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.WarmUp
:members:
Gradient Strategies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GradientAccumulator (TensorFlow)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.GradientAccumulator

View File

@ -0,0 +1,301 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Model outputs
-----------------------------------------------------------------------------------------------------------------------
PyTorch models have outputs that are instances of subclasses of :class:`~transformers.file_utils.ModelOutput`. Those
are data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.
Let's see of this looks on an example:
.. code-block::
from transformers import BertTokenizer, BertForSequenceClassification
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
outputs = model(**inputs, labels=labels)
The ``outputs`` object is a :class:`~transformers.modeling_outputs.SequenceClassifierOutput`, as we can see in the
documentation of that class below, it means it has an optional ``loss``, a ``logits`` an optional ``hidden_states`` and
an optional ``attentions`` attribute. Here we have the ``loss`` since we passed along ``labels``, but we don't have
``hidden_states`` and ``attentions`` because we didn't pass ``output_hidden_states=True`` or
``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``outputs.loss`` is the loss computed by the model, and ``outputs.attentions`` is
``None``.
When considering our ``outputs`` object as tuple, it only considers the attributes that don't have ``None`` values.
Here for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
outputs[:2]
will return the tuple ``(outputs.loss, outputs.logits)`` for instance.
When considering our ``outputs`` object as dictionary, it only considers the attributes that don't have ``None``
values. Here for instance, it has two keys that are ``loss`` and ``logits``.
We document here the generic model outputs that are used by more than one model type. Specific output types are
documented on their corresponding model page.
ModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ModelOutput
:members:
BaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutput
:members:
BaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPooling
:members:
BaseModelOutputWithCrossAttentions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithCrossAttentions
:members:
BaseModelOutputWithPoolingAndCrossAttentions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions
:members:
BaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPast
:members:
BaseModelOutputWithPastAndCrossAttentions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.BaseModelOutputWithPastAndCrossAttentions
:members:
Seq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqModelOutput
:members:
CausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutput
:members:
CausalLMOutputWithCrossAttentions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutputWithCrossAttentions
:members:
CausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.CausalLMOutputWithPast
:members:
MaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MaskedLMOutput
:members:
Seq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqLMOutput
:members:
NextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.NextSentencePredictorOutput
:members:
SequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.SequenceClassifierOutput
:members:
Seq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqSequenceClassifierOutput
:members:
MultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.MultipleChoiceModelOutput
:members:
TokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.TokenClassifierOutput
:members:
QuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.QuestionAnsweringModelOutput
:members:
Seq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
:members:
TFBaseModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutput
:members:
TFBaseModelOutputWithPooling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPooling
:members:
TFBaseModelOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPast
:members:
TFSeq2SeqModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqModelOutput
:members:
TFCausalLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutput
:members:
TFCausalLMOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutputWithPast
:members:
TFMaskedLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMaskedLMOutput
:members:
TFSeq2SeqLMOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqLMOutput
:members:
TFNextSentencePredictorOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFNextSentencePredictorOutput
:members:
TFSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutput
:members:
TFSeq2SeqSequenceClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqSequenceClassifierOutput
:members:
TFMultipleChoiceModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFMultipleChoiceModelOutput
:members:
TFTokenClassifierOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFTokenClassifierOutput
:members:
TFQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFQuestionAnsweringModelOutput
:members:
TFSeq2SeqQuestionAnsweringModelOutput
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
:members:

View File

@ -0,0 +1,148 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Pipelines
-----------------------------------------------------------------------------------------------------------------------
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
:doc:`task summary <../task_summary>` for examples of use.
There are two categories of pipeline abstractions to be aware about:
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
- The other task-specific pipelines:
- :class:`~transformers.ConversationalPipeline`
- :class:`~transformers.FeatureExtractionPipeline`
- :class:`~transformers.FillMaskPipeline`
- :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.SummarizationPipeline`
- :class:`~transformers.TextClassificationPipeline`
- :class:`~transformers.TextGenerationPipeline`
- :class:`~transformers.TokenClassificationPipeline`
- :class:`~transformers.TranslationPipeline`
- :class:`~transformers.ZeroShotClassificationPipeline`
- :class:`~transformers.Text2TextGenerationPipeline`
- :class:`~transformers.TableQuestionAnsweringPipeline`
The pipeline abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
pipeline but requires an additional argument which is the `task`.
.. autofunction:: transformers.pipeline
The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ConversationalPipeline
=======================================================================================================================
.. autoclass:: transformers.Conversation
.. autoclass:: transformers.ConversationalPipeline
:special-members: __call__
:members:
FeatureExtractionPipeline
=======================================================================================================================
.. autoclass:: transformers.FeatureExtractionPipeline
:special-members: __call__
:members:
FillMaskPipeline
=======================================================================================================================
.. autoclass:: transformers.FillMaskPipeline
:special-members: __call__
:members:
NerPipeline
=======================================================================================================================
.. autoclass:: transformers.NerPipeline
See :class:`~transformers.TokenClassificationPipeline` for all details.
QuestionAnsweringPipeline
=======================================================================================================================
.. autoclass:: transformers.QuestionAnsweringPipeline
:special-members: __call__
:members:
SummarizationPipeline
=======================================================================================================================
.. autoclass:: transformers.SummarizationPipeline
:special-members: __call__
:members:
TableQuestionAnsweringPipeline
=======================================================================================================================
.. autoclass:: transformers.TableQuestionAnsweringPipeline
:special-members: __call__
TextClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TextClassificationPipeline
:special-members: __call__
:members:
TextGenerationPipeline
=======================================================================================================================
.. autoclass:: transformers.TextGenerationPipeline
:special-members: __call__
:members:
Text2TextGenerationPipeline
=======================================================================================================================
.. autoclass:: transformers.Text2TextGenerationPipeline
:special-members: __call__
:members:
TokenClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.TokenClassificationPipeline
:special-members: __call__
:members:
TranslationPipeline
=======================================================================================================================
.. autoclass:: transformers.TranslationPipeline
:special-members: __call__
:members:
ZeroShotClassificationPipeline
=======================================================================================================================
.. autoclass:: transformers.ZeroShotClassificationPipeline
:special-members: __call__
:members:
Parent class: :obj:`Pipeline`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Pipeline
:members:

View File

@ -1,58 +1,172 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Processors
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
examples that can be fed to a model.
Processors
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All processors follow the same architecture which is that of the
:class:`~pytorch_transformers.data.processors.utils.DataProcessor`. The processor returns a list
of :class:`~pytorch_transformers.data.processors.utils.InputExample`. These
:class:`~pytorch_transformers.data.processors.utils.InputExample` can be converted to
:class:`~pytorch_transformers.data.processors.utils.InputFeatures` in order to be fed to the model.
:class:`~transformers.data.processors.utils.DataProcessor`. The processor returns a list of
:class:`~transformers.data.processors.utils.InputExample`. These
:class:`~transformers.data.processors.utils.InputExample` can be converted to
:class:`~transformers.data.processors.utils.InputFeatures` in order to be fed to the model.
.. autoclass:: pytorch_transformers.data.processors.utils.DataProcessor
.. autoclass:: transformers.data.processors.utils.DataProcessor
:members:
.. autoclass:: pytorch_transformers.data.processors.utils.InputExample
.. autoclass:: transformers.data.processors.utils.InputExample
:members:
.. autoclass:: pytorch_transformers.data.processors.utils.InputFeatures
.. autoclass:: transformers.data.processors.utils.InputFeatures
:members:
GLUE
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates
the performance of models across a diverse set of existing NLU tasks. It was released together with the paper
`GLUE: A multi-task benchmark and analysis platform for natural language understanding <https://openreview.net/pdf?id=rJ4km2R5t7>`__
`General Language Understanding Evaluation (GLUE) <https://gluebenchmark.com/>`__ is a benchmark that evaluates the
performance of models across a diverse set of existing NLU tasks. It was released together with the paper `GLUE: A
multi-task benchmark and analysis platform for natural language understanding
<https://openreview.net/pdf?id=rJ4km2R5t7>`__
This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched),
CoLA, SST2, STSB, QQP, QNLI, RTE and WNLI.
This library hosts a total of 10 processors for the following tasks: MRPC, MNLI, MNLI (mismatched), CoLA, SST2, STSB,
QQP, QNLI, RTE and WNLI.
Those processors are:
- :class:`~pytorch_transformers.data.processors.utils.MrpcProcessor`
- :class:`~pytorch_transformers.data.processors.utils.MnliProcessor`
- :class:`~pytorch_transformers.data.processors.utils.MnliMismatchedProcessor`
- :class:`~pytorch_transformers.data.processors.utils.Sst2Processor`
- :class:`~pytorch_transformers.data.processors.utils.StsbProcessor`
- :class:`~pytorch_transformers.data.processors.utils.QqpProcessor`
- :class:`~pytorch_transformers.data.processors.utils.QnliProcessor`
- :class:`~pytorch_transformers.data.processors.utils.RteProcessor`
- :class:`~pytorch_transformers.data.processors.utils.WnliProcessor`
Additionally, the following method can be used to load values from a data file and convert them to a list of
:class:`~pytorch_transformers.data.processors.utils.InputExample`.
- :class:`~transformers.data.processors.utils.MrpcProcessor`
- :class:`~transformers.data.processors.utils.MnliProcessor`
- :class:`~transformers.data.processors.utils.MnliMismatchedProcessor`
- :class:`~transformers.data.processors.utils.Sst2Processor`
- :class:`~transformers.data.processors.utils.StsbProcessor`
- :class:`~transformers.data.processors.utils.QqpProcessor`
- :class:`~transformers.data.processors.utils.QnliProcessor`
- :class:`~transformers.data.processors.utils.RteProcessor`
- :class:`~transformers.data.processors.utils.WnliProcessor`
.. automethod:: pytorch_transformers.data.processors.glue.glue_convert_examples_to_features
Additionally, the following method can be used to load values from a data file and convert them to a list of
:class:`~transformers.data.processors.utils.InputExample`.
.. automethod:: transformers.data.processors.glue.glue_convert_examples_to_features
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
An example using these processors is given in the
`run_glue.py <https://github.com/huggingface/pytorch-transformers/blob/master/examples/run_glue.py>`__ script.
An example using these processors is given in the `run_glue.py
<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_glue.py>`__ script.
XNLI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Cross-Lingual NLI Corpus (XNLI) <https://www.nyu.edu/projects/bowman/xnli/>`__ is a benchmark that evaluates the
quality of cross-lingual text representations. XNLI is crowd-sourced dataset based on `MultiNLI
<http://www.nyu.edu/projects/bowman/multinli/>`: pairs of text are labeled with textual entailment annotations for 15
different languages (including both high-resource language such as English and low-resource languages such as Swahili).
It was released together with the paper `XNLI: Evaluating Cross-lingual Sentence Representations
<https://arxiv.org/abs/1809.05053>`__
This library hosts the processor to load the XNLI data:
- :class:`~transformers.data.processors.utils.XnliProcessor`
Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
An example using these processors is given in the `run_xnli.py
<https://github.com/huggingface/pytorch-transformers/blob/master/examples/text-classification/run_xnli.py>`__ script.
SQuAD
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`The Stanford Question Answering Dataset (SQuAD) <https://rajpurkar.github.io/SQuAD-explorer//>`__ is a benchmark that
evaluates the performance of models on question answering. Two versions are available, v1.1 and v2.0. The first version
(v1.1) was released together with the paper `SQuAD: 100,000+ Questions for Machine Comprehension of Text
<https://arxiv.org/abs/1606.05250>`__. The second version (v2.0) was released alongside the paper `Know What You Don't
Know: Unanswerable Questions for SQuAD <https://arxiv.org/abs/1806.03822>`__.
This library hosts a processor for each of the two versions:
Processors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Those processors are:
- :class:`~transformers.data.processors.utils.SquadV1Processor`
- :class:`~transformers.data.processors.utils.SquadV2Processor`
They both inherit from the abstract class :class:`~transformers.data.processors.utils.SquadProcessor`
.. autoclass:: transformers.data.processors.squad.SquadProcessor
:members:
Additionally, the following method can be used to convert SQuAD examples into
:class:`~transformers.data.processors.utils.SquadFeatures` that can be used as model inputs.
.. automethod:: transformers.data.processors.squad.squad_convert_examples_to_features
These processors as well as the aforementionned method can be used with files containing the data as well as with the
`tensorflow_datasets` package. Examples are given below.
Example usage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example using the processors as well as the conversion method using data files:
.. code-block::
# Loading a V2 processor
processor = SquadV2Processor()
examples = processor.get_dev_examples(squad_v2_data_dir)
# Loading a V1 processor
processor = SquadV1Processor()
examples = processor.get_dev_examples(squad_v1_data_dir)
features = squad_convert_examples_to_features(
examples=examples,
tokenizer=tokenizer,
max_seq_length=max_seq_length,
doc_stride=args.doc_stride,
max_query_length=max_query_length,
is_training=not evaluate,
)
Using `tensorflow_datasets` is as easy as using a data file:
.. code-block::
# tensorflow_datasets only handle Squad V1.
tfds_examples = tfds.load("squad")
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)
features = squad_convert_examples_to_features(
examples=examples,
tokenizer=tokenizer,
max_seq_length=max_seq_length,
doc_stride=args.doc_stride,
max_query_length=max_query_length,
is_training=not evaluate,
)
Another example using these processors is given in the :prefix_link:`run_squad.py
<examples/question-answering/run_squad.py>` script.

View File

@ -1,16 +1,72 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Tokenizer
----------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The base class ``PreTrainedTokenizer`` implements the common methods for loading/saving a tokenizer either from a local file or directory, or from a pretrained tokenizer provided by the library (downloaded from HuggingFace's AWS S3 repository).
A tokenizer is in charge of preparing the inputs for a model. The library contains tokenizers for all the models. Most
of the tokenizers are available in two flavors: a full python implementation and a "Fast" implementation based on the
Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fast" implementations allows:
``PreTrainedTokenizer`` is the main entry point into tokenizers as it also implements the main methods for using all the tokenizers:
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
and XLNet models).
- tokenizing, converting tokens to ids and back and encoding/decoding,
- adding new tokens to the vocabulary in a way that is independant of the underlying structure (BPE, SentencePiece...),
- managing special tokens (adding them, assigning them to roles, making sure they are not split during tokenization)
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and
"Fast" tokenizers either from a local file or directory or from a pretrained tokenizer provided by the library
(downloaded from HuggingFace's AWS S3 repository). They both rely on
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that contains the common methods, and
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
``PreTrainedTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` thus implement the main
methods for using all the tokenizers:
- Tokenizing (splitting strings in sub-word token strings), converting tokens strings to ids and back, and
encoding/decoding (i.e., tokenizing and converting to integers).
- Adding new tokens to the vocabulary in a way that is independent of the underlying structure (BPE, SentencePiece...).
- Managing special tokens (like mask, beginning-of-sentence, etc.): adding them, assigning them to attributes in the
tokenizer for easy access and making sure they are not split during tokenization.
:class:`~transformers.BatchEncoding` holds the output of the tokenizer's encoding methods (``__call__``,
``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python
tokenizer, this class behaves just like a standard python dictionary and holds the various model inputs computed by
these methods (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e., backed by
HuggingFace `tokenizers library <https://github.com/huggingface/tokenizers>`__), this class provides in addition
several advanced alignment methods which can be used to map between the original string (character and words) and the
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding
to a given token).
PreTrainedTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizer
:special-members: __call__
:members:
PreTrainedTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PreTrainedTokenizerFast
:special-members: __call__
:members:
BatchEncoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchEncoding
:members:

View File

@ -0,0 +1,533 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Trainer
-----------------------------------------------------------------------------------------------------------------------
The :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` classes provide an API for feature-complete
training in most standard use cases. It's used in most of the :doc:`example scripts <../examples>`.
Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`, create a
:class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` to access all the points of
customization during training.
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
<https://github.com/NVIDIA/apex>`__ for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop supporting the
previous features. To inject custom behavior you can subclass them and override the following methods:
- **get_train_dataloader**/**get_train_tfdataset** -- Creates the training DataLoader (PyTorch) or TF Dataset.
- **get_eval_dataloader**/**get_eval_tfdataset** -- Creates the evaluation DataLoader (PyTorch) or TF Dataset.
- **get_test_dataloader**/**get_test_tfdataset** -- Creates the test DataLoader (PyTorch) or TF Dataset.
- **log** -- Logs information on the various objects watching training.
- **create_optimizer_and_scheduler** -- Setups the optimizer and learning rate scheduler if they were not passed at
init.
- **compute_loss** - Computes the loss on a batch of training inputs.
- **training_step** -- Performs a training step.
- **prediction_step** -- Performs an evaluation/test step.
- **run_model** (TensorFlow only) -- Basic pass through the model.
- **evaluate** -- Runs an evaluation loop and returns metrics.
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function:
.. code-block:: python
from transformers import Trainer
class MyTrainer(Trainer):
def compute_loss(self, model, inputs):
labels = inputs.pop("labels")
outputs = model(**inputs)
logits = outputs[0]
return my_custom_loss(logits, labels)
Another way to customize the training loop behavior for the PyTorch :class:`~transformers.Trainer` is to use
:doc:`callbacks <callback>` that can inspect the training loop state (for progress reporting, logging on TensorBoard or
other ML platforms...) and take decisions (like early stopping).
Trainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Trainer
:members:
Seq2SeqTrainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Seq2SeqTrainer
:members: evaluate, predict
TFTrainer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainer
:members:
TrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainingArguments
:members:
Seq2SeqTrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Seq2SeqTrainingArguments
:members:
TFTrainingArguments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFTrainingArguments
:members:
Trainer Integrations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :class:`~transformers.Trainer` has been extended to support libraries that may dramatically improve your training
time and fit much bigger models.
Currently it supports third party solutions, `DeepSpeed <https://github.com/microsoft/DeepSpeed>`__ and `FairScale
<https://github.com/facebookresearch/fairscale/>`__, which implement parts of the paper `ZeRO: Memory Optimizations
Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
<https://arxiv.org/abs/1910.02054>`__.
This provided support is new and experimental as of this writing.
You will need at least 2 GPUs to benefit from these features.
FairScale
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By integrating `FairScale <https://github.com/facebookresearch/fairscale/>`__ the :class:`~transformers.Trainer`
provides support for the following features from `the ZeRO paper <https://arxiv.org/abs/1910.02054>`__:
1. Optimizer State Sharding
2. Gradient Sharding
To deploy this feature:
1. Install the library via pypi:
.. code-block:: bash
pip install fairscale
or find more details on `the FairScale's github page
<https://github.com/facebookresearch/fairscale/#installation>`__.
2. Add ``--sharded_ddp`` to the command line arguments, and make sure you have added the distributed launcher ``-m
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``finetune_trainer.py`` with 2 GPUs:
.. code-block:: bash
cd examples/seq2seq
python -m torch.distributed.launch --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \
--fp16 --sharded_ddp
Notes:
- This feature requires distributed training (so multiple GPUs).
- It is not implemented for TPUs.
- It works with ``--fp16`` too, to make things even faster.
- One of the main benefits of enabling ``--sharded_ddp`` is that it uses a lot less GPU memory, so you should be able
to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
significantly shorter training time.
DeepSpeed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`DeepSpeed <https://github.com/microsoft/DeepSpeed>`__ implements everything described in the `ZeRO paper
<https://arxiv.org/abs/1910.02054>`__, except ZeRO's stage 3. "Parameter Partitioning (Pos+g+p)". Currently it provides
full support for:
1. Optimizer State Partitioning (ZeRO stage 1)
2. Add Gradient Partitioning (ZeRO stage 2)
To deploy this feature:
1. Install the library via pypi:
.. code-block:: bash
pip install deepspeed
or find more details on `the DeepSpeed's github page <https://github.com/microsoft/deepspeed#installation>`__.
2. Adjust the :class:`~transformers.Trainer` command line arguments as following:
1. replace ``python -m torch.distributed.launch`` with ``deepspeed``.
2. add a new argument ``--deepspeed ds_config.json``, where ``ds_config.json`` is the DeepSpeed configuration file
as documented `here <https://www.deepspeed.ai/docs/config-json/>`__. The file naming is up to you.
Therefore, if your original command line looked as following:
.. code-block:: bash
python -m torch.distributed.launch --nproc_per_node=2 your_program.py <normal cl args>
Now it should be:
.. code-block:: bash
deepspeed --num_gpus=2 your_program.py <normal cl args> --deepspeed ds_config.json
Unlike, ``torch.distributed.launch`` where you have to specify how many GPUs to use with ``--nproc_per_node``, with
the ``deepspeed`` launcher you don't have to use the corresponding ``--num_gpus`` if you want all of your GPUs used.
The full details on how to configure various nodes and GPUs can be found `here
<https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node>`__.
Here is an example of running ``finetune_trainer.py`` under DeepSpeed deploying all available GPUs:
.. code-block:: bash
cd examples/seq2seq
deepspeed ./finetune_trainer.py --deepspeed ds_config.json \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation
Note that in the DeepSpeed documentation you are likely to see ``--deepspeed --deepspeed_config ds_config.json`` -
i.e. two DeepSpeed-related arguments, but for the sake of simplicity, and since there are already so many arguments
to deal with, we combined the two into a single argument.
Before you can deploy DeepSpeed, let's discuss its configuration.
**Configuration:**
For the complete guide to the DeepSpeed configuration options that can be used in its configuration file please refer
to the `following documentation <https://www.deepspeed.ai/docs/config-json/>`__.
While you always have to supply the DeepSpeed configuration file, you can configure the DeepSpeed integration in
several ways:
1. Supply most of the configuration inside the file, and just use a few required command line arguments. This is the
recommended way as it puts most of the configuration params in one place.
2. Supply just the ZeRO configuration params inside the file, and configure the rest using the normal
:class:`~transformers.Trainer` command line arguments.
3. Any variation of the first two ways.
To get an idea of what DeepSpeed configuration file looks like, here is one that activates ZeRO stage 2 features,
enables FP16, uses AdamW optimizer and WarmupLR scheduler:
.. code-block:: json
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": 3e-5,
"betas": [ 0.8, 0.999 ],
"eps": 1e-8,
"weight_decay": 3e-7
}
},
"zero_allow_untested_optimizer": true,
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 3e-5,
"warmup_num_steps": 500
}
}
}
If you already have a command line that you have been using with :class:`transformers.Trainer` args, you can continue
using those and the :class:`~transformers.Trainer` will automatically convert them into the corresponding DeepSpeed
configuration at run time. For example, you could use the following configuration file:
.. code-block:: json
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
}
}
and the following command line arguments:
.. code-block:: bash
--learning_rate 3e-5 --warmup_steps 500 --adam_beta1 0.8 --adam_beta2 0.999 --adam_epsilon 1e-8 \
--weight_decay 3e-7 --lr_scheduler_type constant_with_warmup --fp16 --fp16_backend amp
to achieve the same configuration as provided by the longer json file in the first example.
When you execute the program, DeepSpeed will log the configuration it received from the :class:`~transformers.Trainer`
to the console, so you can see exactly what the final configuration was passed to it.
**Shared Configuration:**
Some configuration information is required by both the :class:`~transformers.Trainer` and DeepSpeed to function
correctly, therefore, to prevent conflicting definitions, which could lead to hard to detect errors, we chose to
configure those via the :class:`~transformers.Trainer` command line arguments.
Therefore, the following DeepSpeed configuration params shouldn't be used with the :class:`~transformers.Trainer`:
* ``train_batch_size``
* ``train_micro_batch_size_per_gpu``
* ``gradient_accumulation_steps``
as these will be automatically derived from the run time environment and the following 2 command line arguments:
.. code-block:: bash
--per_device_train_batch_size 8 --gradient_accumulation_steps 2
which are always required to be supplied.
Of course, you will need to adjust the values in this example to your situation.
**ZeRO:**
The ``zero_optimization`` section of the configuration file is the most important part (`docs
<https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`__), since that is where you define
which ZeRO stages you want to enable and how to configure them.
.. code-block:: json
{
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"cpu_offload": true
}
}
Notes:
- enabling ``cpu_offload`` should reduce GPU RAM usage (it requires ``"stage": 2``)
- ``"overlap_comm": true`` trades off increased GPU RAM usage to lower all-reduce latency. ``overlap_comm`` uses 4.5x
the ``allgather_bucket_size`` and ``reduce_bucket_size`` values. So if they are set to 5e8, this requires a 9GB
footprint (``5e8 x 2Bytes x 2 x 4.5``). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting
OOM-errors you will need to reduce those parameters to about ``2e8``, which would require 3.6GB.
This section has to be configured exclusively via DeepSpeed configuration - the :class:`~transformers.Trainer` provides
no equivalent command line arguments.
**Optimizer:**
DeepSpeed's main optimizers are Adam, OneBitAdam, and Lamb. These have been thoroughly tested with ZeRO and are thus
recommended to be used. It, however, can import other optimizers from ``torch``. The full documentation is `here
<https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`__.
If you don't configure the ``optimizer`` entry in the configuration file, the :class:`~transformers.Trainer` will
automatically set it to ``AdamW`` and will use the supplied values or the defaults for the following command line
arguments: ``--learning_rate``, ``--adam_beta1``, ``--adam_beta2``, ``--adam_epsilon`` and ``--weight_decay``.
Here is an example of the pre-configured ``optimizer`` entry for AdamW:
.. code-block:: json
{
"zero_allow_untested_optimizer": true,
"optimizer": {
"type": "AdamW",
"params": {
"lr": 0.001,
"betas": [0.8, 0.999],
"eps": 1e-8,
"weight_decay": 3e-7
}
}
}
Since AdamW isn't on the list of tested with DeepSpeed/ZeRO optimizers, we have to add
``zero_allow_untested_optimizer`` flag.
If you want to use one of the officially supported optimizers, configure them explicitly in the configuration file, and
make sure to adjust the values. e.g. if use Adam you will want ``weight_decay`` around ``0.01``.
**Scheduler:**
DeepSpeed supports LRRangeTest, OneCycle, WarmupLR and WarmupDecayLR LR schedulers. The full documentation is `here
<https://www.deepspeed.ai/docs/config-json/#scheduler-parameters>`__.
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
the value of ``--lr_scheduler_type`` to configure it. Currently the :class:`~transformers.Trainer` supports only 2 LR
schedulers that are also supported by DeepSpeed:
* ``WarmupLR`` via ``--lr_scheduler_type constant_with_warmup``
* ``WarmupDecayLR`` via ``--lr_scheduler_type linear``. This is also the default value for ``--lr_scheduler_type``,
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
In either case, the values of ``--learning_rate`` and ``--warmup_steps`` will be used for the configuration.
In other words, if you don't use the configuration file to set the ``scheduler`` entry, provide either:
.. code-block:: bash
--lr_scheduler_type constant_with_warmup --learning_rate 3e-5 --warmup_steps 500
or
.. code-block:: bash
--lr_scheduler_type linear --learning_rate 3e-5 --warmup_steps 500
with the desired values. If you don't pass these arguments, reasonable default values will be used instead.
In the case of WarmupDecayLR ``total_num_steps`` gets set either via the ``--max_steps`` command line argument, or if
it is not provided, derived automatically at run time based on the environment and the size of the dataset and other
command line arguments.
Here is an example of the pre-configured ``scheduler`` entry for WarmupLR (``constant_with_warmup`` in the
:class:`~transformers.Trainer` API):
.. code-block:: json
{
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": 0,
"warmup_max_lr": 0.001,
"warmup_num_steps": 1000
}
}
}
**Automatic Mixed Precision:**
You can work with FP16 in one of the following ways:
1. Pytorch native amp, as documented `here <https://www.deepspeed.ai/docs/config-json/#fp16-training-options>`__.
2. NVIDIA's apex, as documented `here
<https://www.deepspeed.ai/docs/config-json/#automatic-mixed-precision-amp-training-options>`__.
If you want to use an equivalent of the pytorch native amp, you can either configure the ``fp16`` entry in the
configuration file, or use the following command line arguments: ``--fp16 --fp16_backend amp``.
Here is an example of the ``fp16`` configuration:
.. code-block:: json
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"hysteresis": 2,
"min_loss_scale": 1
},
}
If you want to use NVIDIA's apex instead, you can can either configure the ``amp`` entry in the configuration file, or
use the following command line arguments: ``--fp16 --fp16_backend apex --fp16_opt_level 01``.
Here is an example of the ``amp`` configuration:
.. code-block:: json
{
"amp": {
"enabled": true,
"opt_level": "O1"
}
}
**Gradient Clipping:**
If you don't configure the ``gradient_clipping`` entry in the configuration file, the :class:`~transformers.Trainer`
will use the value of the ``--max_grad_norm`` command line argument to set it.
Here is an example of the ``gradient_clipping`` configuration:
.. code-block:: json
{
"gradient_clipping": 1.0,
}
**Notes:**
* DeepSpeed works with the PyTorch :class:`~transformers.Trainer` but not TF :class:`~transformers.TFTrainer`.
* While DeepSpeed has a pip installable PyPI package, it is highly recommended that it gets installed from `source
<https://github.com/microsoft/deepspeed#installation>`__ to best match your hardware and also if you need to enable
certain features, like 1-bit Adam, which aren't available in the pypi distribution.
* You don't have to use the :class:`~transformers.Trainer` to use DeepSpeed with HuggingFace ``transformers`` - you can
use any model with your own trainer, and you will have to adapt the latter according to `the DeepSpeed integration
instructions <https://www.deepspeed.ai/getting-started/#writing-deepspeed-models>`__.
**Main DeepSpeed Resources:**
- `github <https://github.com/microsoft/deepspeed>`__
- `Usage docs <https://www.deepspeed.ai/getting-started/>`__
- `API docs <https://deepspeed.readthedocs.io/en/latest/index.html>`__
Finally, please, remember that, HuggingFace :class:`~transformers.Trainer` only integrates DeepSpeed, therefore if you
have any problems or questions with regards to DeepSpeed usage, please, file an issue with `DeepSpeed github
<https://github.com/microsoft/DeepSpeed/issues>`__.

View File

@ -1,17 +1,211 @@
# Migrating from pytorch-pretrained-bert
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Migrating from previous packages
## Migrating from transformers `v3.x` to `v4.x`
A couple of changes were introduced when the switch from version 3 to version 4 was done. Below is a summary of the
expected changes:
#### 1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set.
This introduces two breaking changes:
- The handling of overflowing tokens between the python and rust tokenizers is different.
- The rust tokenizers do not accept integers in the encoding methods.
##### How to obtain the same behavior as v3.x in v4.x
- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:
In version `v3.x`:
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
```
to obtain the same in version `v4.x`:
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", use_fast=False)
```
#### 2. SentencePiece is removed from the required dependencies
The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation.
This includes the **slow** versions of:
- `XLNetTokenizer`
- `AlbertTokenizer`
- `CamembertTokenizer`
- `MBartTokenizer`
- `PegasusTokenizer`
- `T5Tokenizer`
- `ReformerTokenizer`
- `XLMRobertaTokenizer`
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally:
In version `v3.x`:
```bash
pip install transformers
```
to obtain the same in version `v4.x`:
```bash
pip install transformers[sentencepiece]
```
or
```bash
pip install transformers sentencepiece
```
#### 3. The architecture of the repo has been updated so that each model resides in its folder
The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.
This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.
In version `v3.x`:
```bash
from transformers.modeling_bert import BertLayer
```
to obtain the same in version `v4.x`:
```bash
from transformers.models.bert.modeling_bert import BertLayer
```
#### 4. Switching the `return_dict` argument to `True` by default
The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass.
In version `v3.x`:
```bash
model = BertModel.from_pretrained("bert-base-cased")
outputs = model(**inputs)
```
to obtain the same in version `v4.x`:
```bash
model = BertModel.from_pretrained("bert-base-cased")
outputs = model(**inputs, return_dict=False)
```
or
```bash
model = BertModel.from_pretrained("bert-base-cased", return_dict=False)
outputs = model(**inputs)
```
#### 5. Removed some deprecated attributes
Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in [#8604](https://github.com/huggingface/transformers/pull/8604).
Here is a list of these attributes/methods/arguments and what their replacements should be:
In several models, the labels become consistent with the other models:
- `masked_lm_labels` becomes `labels` in `AlbertForMaskedLM` and `AlbertForPreTraining`.
- `masked_lm_labels` becomes `labels` in `BertForMaskedLM` and `BertForPreTraining`.
- `masked_lm_labels` becomes `labels` in `DistilBertForMaskedLM`.
- `masked_lm_labels` becomes `labels` in `ElectraForMaskedLM`.
- `masked_lm_labels` becomes `labels` in `LongformerForMaskedLM`.
- `masked_lm_labels` becomes `labels` in `MobileBertForMaskedLM`.
- `masked_lm_labels` becomes `labels` in `RobertaForMaskedLM`.
- `lm_labels` becomes `labels` in `BartForConditionalGeneration`.
- `lm_labels` becomes `labels` in `GPT2DoubleHeadsModel`.
- `lm_labels` becomes `labels` in `OpenAIGPTDoubleHeadsModel`.
- `lm_labels` becomes `labels` in `T5ForConditionalGeneration`.
In several models, the caching mechanism becomes consistent with the other models:
- `decoder_cached_states` becomes `past_key_values` in all BART-like, FSMT and T5 models.
- `decoder_past_key_values` becomes `past_key_values` in all BART-like, FSMT and T5 models.
- `past` becomes `past_key_values` in all CTRL models.
- `past` becomes `past_key_values` in all GPT-2 models.
Regarding the tokenizer classes:
- The tokenizer attribute `max_len` becomes `model_max_length`.
- The tokenizer attribute `return_lengths` becomes `return_length`.
- The tokenizer encoding argument `is_pretokenized` becomes `is_split_into_words`.
Regarding the `Trainer` class:
- The `Trainer` argument `tb_writer` is removed in favor of the callback `TensorBoardCallback(tb_writer=...)`.
- The `Trainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` attribute `data_collator` should be a callable.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `Trainer` method `_training_step` is deprecated in favor of `training_step`.
- The `Trainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `Trainer` method `is_local_master` is deprecated in favor of `is_local_process_zero`.
- The `Trainer` method `is_world_master` is deprecated in favor of `is_world_process_zero`.
Regarding the `TFTrainer` class:
- The `TFTrainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `TFTrainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `TFTrainer` method `_setup_wandb` is deprecated in favor of `setup_wandb`.
- The `TFTrainer` method `_run_model` is deprecated in favor of `run_model`.
Regarding the `TrainerArgument` class:
- The `TrainerArgument` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`.
Regarding the Transfo-XL model:
- The Transfo-XL configuration attribute `tie_weight` becomes `tie_words_embeddings`.
- The Transfo-XL modeling method `reset_length` becomes `reset_memory_length`.
Regarding pipelines:
- The `FillMaskPipeline` argument `topk` becomes `top_k`.
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to `transformers`
## Migrating from pytorch-transformers to 🤗 Transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to `transformers` is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailled in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to `transformers` conversion example for a `BertForSequenceClassification` classification model:
Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model:
```python
# Let's load our model
@ -20,14 +214,14 @@ model = BertForSequenceClassification.from_pretrained('bert-base-uncased')
# If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels)
# Now just use this line in transformers to extract the loss from the output tuple:
# Now just use this line in 🤗 Transformers to extract the loss from the output tuple:
outputs = model(input_ids, labels=labels)
loss = outputs[0]
# In transformers you can also have access to the logits:
# In 🤗 Transformers you can also have access to the logits:
loss, logits = outputs[:2]
# And even the attention weigths if you configure the model to output them (and other outputs too, see the docstrings and documentation)
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True)
outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs
@ -84,26 +278,26 @@ Here is a conversion examples from `BertAdam` with a linear warmup and decay sch
# Parameters:
lr = 1e-3
max_grad_norm = 1.0
num_total_steps = 1000
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_total_steps) # 0.1
warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, t_total=num_total_steps)
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, num_training_steps=num_training_steps)
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
optimizer.step()
### In Transformers, optimizer and schedules are splitted and instantiated like this:
### In 🤗 Transformers, optimizer and schedules are split and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = WarmupLinearSchedule(optimizer, warmup_steps=num_warmup_steps, t_total=num_total_steps) # PyTorch scheduler
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler
### and used like this:
for batch in train_data:
loss = model(batch)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
scheduler.step()
optimizer.step()
scheduler.step()
```

View File

@ -0,0 +1,175 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
ALBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ALBERT model was proposed in `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
<https://arxiv.org/abs/1909.11942>`__ by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma,
Radu Soricut. It presents two parameter-reduction techniques to lower memory consumption and increase the training
speed of BERT:
- Splitting the embedding matrix into two smaller matrices.
- Using repeating layers split among groups.
The abstract from the paper is the following:
*Increasing model size when pretraining natural language representations often results in improved performance on
downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations,
longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction
techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows
that our proposed methods lead to models that scale much better compared to the original BERT. We also use a
self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
Tips:
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
than the left.
- ALBERT uses repeating layers which results in a small memory footprint, however the computational cost remains
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
number of (repeating) layers.
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
AlbertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertConfig
:members:
AlbertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
AlbertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertTokenizerFast
:members:
Albert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput
:members:
.. autoclass:: transformers.models.albert.modeling_tf_albert.TFAlbertForPreTrainingOutput
:members:
AlbertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertModel
:members: forward
AlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForPreTraining
:members: forward
AlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMaskedLM
:members: forward
AlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForSequenceClassification
:members: forward
AlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForMultipleChoice
:members:
AlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForTokenClassification
:members: forward
AlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AlbertForQuestionAnswering
:members: forward
TFAlbertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertModel
:members: call
TFAlbertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForPreTraining
:members: call
TFAlbertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMaskedLM
:members: call
TFAlbertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForSequenceClassification
:members: call
TFAlbertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForMultipleChoice
:members: call
TFAlbertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForTokenClassification
:members: call
TFAlbertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAlbertForQuestionAnswering
:members: call

View File

@ -1,29 +1,191 @@
AutoModels
-----------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the ``from_pretrained`` method.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
AutoClasses are here to do this job for you so that you automatically retreive the relevant model given the name/path to the pretrained weights/config/vocabulary:
http://www.apache.org/licenses/LICENSE-2.0
Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will directly create a class of the relevant architecture (ex: ``model = AutoModel.from_pretrained('bert-base-cased')`` will create a instance of ``BertModel``).
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Auto Classes
-----------------------------------------------------------------------------------------------------------------------
In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you
are supplying to the :obj:`from_pretrained()` method. AutoClasses are here to do this job for you so that you
automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.
Instantiating one of :class:`~transformers.AutoConfig`, :class:`~transformers.AutoModel`, and
:class:`~transformers.AutoTokenizer` will directly create a class of the relevant architecture. For instance
``AutoConfig``
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
model = AutoModel.from_pretrained('bert-base-cased')
will create a model that is an instance of :class:`~transformers.BertModel`.
There is one class of :obj:`AutoModel` for each task, and for each backend (PyTorch or TensorFlow).
AutoConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoConfig
:members:
``AutoModel``
~~~~~~~~~~~~~~~~~~~~~
AutoTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoTokenizer
:members:
AutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModel
:members:
``AutoTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
AutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoTokenizer
.. autoclass:: transformers.AutoModelForPreTraining
:members:
AutoModelForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForCausalLM
:members:
AutoModelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForMaskedLM
:members:
AutoModelForSeq2SeqLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSeq2SeqLM
:members:
AutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForSequenceClassification
:members:
AutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForMultipleChoice
:members:
AutoModelForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForNextSentencePrediction
:members:
AutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTokenClassification
:members:
AutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForQuestionAnswering
:members:
AutoModelForTableQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.AutoModelForTableQuestionAnswering
:members:
TFAutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModel
:members:
TFAutoModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForPreTraining
:members:
TFAutoModelForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForCausalLM
:members:
TFAutoModelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForMaskedLM
:members:
TFAutoModelForSeq2SeqLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForSeq2SeqLM
:members:
TFAutoModelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForSequenceClassification
:members:
TFAutoModelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForMultipleChoice
:members:
TFAutoModelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForTokenClassification
:members:
TFAutoModelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFAutoModelForQuestionAnswering
:members:
FlaxAutoModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxAutoModel
:members:

View File

@ -0,0 +1,146 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BART
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@patrickvonplaten
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Bart model was proposed in `BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
Translation, and Comprehension <https://arxiv.org/abs/1910.13461>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan
Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer on 29 Oct, 2019.
According to the abstract,
- Bart uses a standard seq2seq/machine translation architecture with a bidirectional encoder (like BERT) and a
left-to-right decoder (like GPT).
- The pretraining task involves randomly shuffling the order of the original sentences and a novel in-filling scheme,
where spans of text are replaced with a single mask token.
- BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It
matches the performance of RoBERTa with comparable training resources on GLUE and SQuAD, achieves new
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
Examples
_______________________________________________________________________________________________________________________
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
:prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
- An example of how to train :class:`~transformers.BartForConditionalGeneration` with a Hugging Face :obj:`datasets`
object can be found in this `forum discussion
<https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904>`__.
- `Distilled checkpoints <https://huggingface.co/models?search=distilbart>`__ are described in this `paper
<https://arxiv.org/abs/2010.13002>`__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Bart doesn't use :obj:`token_type_ids` for sequence classification. Use :class:`~transformers.BartTokenizer` or
:meth:`~transformers.BartTokenizer.encode` to get the proper splitting.
- The forward pass of :class:`~transformers.BartModel` will create the ``decoder_input_ids`` if they are not passed.
This is different than some other modeling APIs. A typical use case of this feature is mask filling.
- Model predictions are intended to be identical to the original implementation when
:obj:`force_bos_token_to_be_generated=True`. This only works, however, if the string you pass to
:func:`fairseq.encode` starts with a space.
- :meth:`~transformers.BartForConditionalGeneration.generate` should be used for conditional generation tasks like
summarization, see the example in that docstrings.
- Models that load the `facebook/bart-large-cnn` weights will not have a :obj:`mask_token_id`, or be able to perform
mask-filling tasks.
Mask Filling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The :obj:`facebook/bart-base` and :obj:`facebook/bart-large` checkpoints can be used to fill multi-token masks.
.. code-block::
from transformers import BartForConditionalGeneration, BartTokenizer
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", force_bos_token_to_be_generated=True)
tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
batch = tok(example_english_phrase, return_tensors='pt')
generated_ids = model.generate(batch['input_ids'])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == ['UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria']
BartConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartConfig
:members:
BartTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizer
:members:
BartTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartTokenizerFast
:members:
BartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartModel
:members: forward
BartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForConditionalGeneration
:members: forward
BartForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForSequenceClassification
:members: forward
BartForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BartForQuestionAnswering
:members: forward
TFBartModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBartModel
:members: call
TFBartForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBartForConditionalGeneration
:members: call

View File

@ -0,0 +1,59 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BARThez
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model`
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
2020.
The abstract of the paper:
*Inductive transfer learning, enabled by self-supervised learning, have taken the entire Natural Language Processing
(NLP) field by storm, with models such as BERT and BART setting new state of the art on countless natural language
understanding tasks. While there are some notable exceptions, most of the available models and research have been
conducted for the English language. In this work, we introduce BARThez, the first BART model for the French language
(to the best of our knowledge). BARThez was pretrained on a very large monolingual French corpus from past research
that we adapted to suit BART's perturbation schemes. Unlike already existing BERT-based French language models such as
CamemBERT and FlauBERT, BARThez is particularly well-suited for generative tasks, since not only its encoder but also
its decoder is pretrained. In addition to discriminative tasks from the FLUE benchmark, we evaluate BARThez on a novel
summarization dataset, OrangeSum, that we release with this paper. We also continue the pretraining of an already
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__.
Examples
_______________________________________________________________________________________________________________________
- BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
:prefix_link:`examples/seq2seq/ <examples/seq2seq/README.md>`.
BarthezTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BarthezTokenizer
:members:
BarthezTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BarthezTokenizerFast
:members:

View File

@ -1,128 +1,216 @@
BERT
----------------------------------------------------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
``BertConfig``
~~~~~~~~~~~~~~~~~~~~~
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BERT model was proposed in `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
<https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It's a
bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence
prediction on a large corpus comprising the Toronto Book Corpus and Wikipedia.
The abstract from the paper is the following:
*We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations
from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional
representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result,
the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models
for a wide range of tasks, such as question answering and language inference, without substantial task-specific
architecture modifications.*
*BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural
language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*
Tips:
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
The original code can be found `here <https://github.com/google-research/bert>`__.
BertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertConfig
:members:
``BertTokenizer``
~~~~~~~~~~~~~~~~~~~~~
BertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizerFast
:members:
``BertModel``
~~~~~~~~~~~~~~~~~~~~
Bert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.bert.modeling_bert.BertForPreTrainingOutput
:members:
.. autoclass:: transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput
:members:
BertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertModel
:members:
:members: forward
``BertForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForPreTraining
:members:
:members: forward
``BertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~
BertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertLMHeadModel
:members: forward
BertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMaskedLM
:members:
:members: forward
``BertForNextSentencePrediction``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForNextSentencePrediction
:members:
:members: forward
``BertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForSequenceClassification
:members:
:members: forward
``BertForMultipleChoice``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForMultipleChoice
:members:
:members: forward
``BertForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForTokenClassification
:members:
:members: forward
``BertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertForQuestionAnswering
:members:
:members: forward
``TFBertModel``
~~~~~~~~~~~~~~~~~~~~
TFBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertModel
:members:
.. autoclass:: transformers.TFBertModel
:members: call
``TFBertForPreTraining``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForPreTraining
:members:
.. autoclass:: transformers.TFBertForPreTraining
:members: call
``TFBertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertModelLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForMaskedLM
:members:
.. autoclass:: transformers.TFBertLMHeadModel
:members: call
``TFBertForNextSentencePrediction``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForNextSentencePrediction
:members:
.. autoclass:: transformers.TFBertForMaskedLM
:members: call
``TFBertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForNextSentencePrediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForSequenceClassification
:members:
.. autoclass:: transformers.TFBertForNextSentencePrediction
:members: call
``TFBertForMultipleChoice``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForMultipleChoice
:members:
.. autoclass:: transformers.TFBertForSequenceClassification
:members: call
``TFBertForTokenClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForTokenClassification
:members:
.. autoclass:: transformers.TFBertForMultipleChoice
:members: call
``TFBertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFBertForQuestionAnswering
:members:
.. autoclass:: transformers.TFBertForTokenClassification
:members: call
TFBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBertForQuestionAnswering
:members: call
FlaxBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertModel
:members: __call__
FlaxBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaxBertForMaskedLM
:members: __call__

View File

@ -0,0 +1,108 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BertGeneration
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using
:class:`~transformers.EncoderDecoderModel` as proposed in `Leveraging Pre-trained Checkpoints for Sequence Generation
Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
The abstract from the paper is the following:
*Unsupervised pretraining of large neural models has recently revolutionized Natural Language Processing. By
warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple
benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language
Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We
developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT,
GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
Text Summarization, Sentence Splitting, and Sentence Fusion.*
Usage:
- The model can be used in combination with the :class:`~transformers.EncoderDecoderModel` to leverage two pretrained
BERT checkpoints for subsequent fine-tuning.
.. code-block::
# leverage checkpoints for Bert2Bert model...
# use BERT's cls token as BOS token and sep token as EOS token
encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
# add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102)
bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
# create tokenizer...
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids
labels = tokenizer('This is a short summary', return_tensors="pt").input_ids
# train...
loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
loss.backward()
- Pretrained :class:`~transformers.EncoderDecoderModel` are also directly available in the model hub, e.g.,
.. code-block::
# instantiate sentence fusion model
sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids
outputs = sentence_fuser.generate(input_ids)
print(tokenizer.decode(outputs[0]))
Tips:
- :class:`~transformers.BertGenerationEncoder` and :class:`~transformers.BertGenerationDecoder` should be used in
combination with :class:`~transformers.EncoderDecoder`.
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
BertGenerationConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationConfig
:members:
BertGenerationTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationTokenizer
:members: save_vocabulary
BertGenerationEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationEncoder
:members: forward
BertGenerationDecoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertGenerationDecoder
:members: forward

View File

@ -0,0 +1,64 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Bertweet
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The BERTweet model was proposed in `BERTweet: A pre-trained language model for English Tweets
<https://www.aclweb.org/anthology/2020.emnlp-demos.2.pdf>`__ by Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen.
The abstract from the paper is the following:
*We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having
the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et
al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al.,
2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks:
Part-of-speech tagging, Named-entity recognition and text classification.*
Example of use:
.. code-block::
import torch
from transformers import AutoModel, AutoTokenizer
bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
# For transformers v4.x+:
tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
# For transformers v3.x:
# tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
# INPUT TWEET IS ALREADY NORMALIZED!
line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
input_ids = torch.tensor([tokenizer.encode(line)])
with torch.no_grad():
features = bertweet(input_ids) # Models outputs are now tuples
## With TensorFlow 2.0+:
# from transformers import TFAutoModel
# bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
BertweetTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertweetTokenizer
:members:

View File

@ -0,0 +1,112 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Blenderbot
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ .
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot
<https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
The abstract of the paper is the following:
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- Blenderbot uses a standard `seq2seq model transformer <https://arxiv.org/pdf/1706.03762.pdf>`__ based architecture.
- Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
- This is the `default` Blenderbot model class. However, some smaller checkpoints, such as
``facebook/blenderbot_small_90M``, have a different architecture and consequently should be used with
`BlenderbotSmall <https://huggingface.co/transformers/master/model_doc/blenderbot_small.html>`__.
Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is an example of model usage:
.. code-block::
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
>>> mname = 'facebook/blenderbot-400M-distill'
>>> model = BlenderbotForConditionalGeneration.from_pretrained(mname)
>>> tokenizer = BlenderbotTokenizer.from_pretrained(mname)
>>> UTTERANCE = "My friends are cool but they eat too many carbs."
>>> inputs = tokenizer([UTTERANCE], return_tensors='pt')
>>> reply_ids = model.generate(**inputs)
>>> print(tokenizer.batch_decode(reply_ids))
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]
BlenderbotConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotConfig
:members:
BlenderbotTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotTokenizer
:members: build_inputs_with_special_tokens
BlenderbotModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See :obj:`transformers.BartModel` for arguments to `forward` and `generate`
.. autoclass:: transformers.BlenderbotModel
:members: forward
BlenderbotForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See :obj:`transformers.BartForConditionalGeneration` for arguments to `forward` and `generate`
.. autoclass:: transformers.BlenderbotForConditionalGeneration
:members: forward
TFBlenderbotModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBlenderbotModel
:members: call
TFBlenderbotForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBlenderbotForConditionalGeneration
:members: call

View File

@ -0,0 +1,84 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Blenderbot Small
-----------------------------------------------------------------------------------------------------------------------
Note that :class:`~transformers.BlenderbotSmallModel` and
:class:`~transformers.BlenderbotSmallForConditionalGeneration` are only used in combination with the checkpoint
`facebook/blenderbot-90M <https://huggingface.co/facebook/blenderbot-90M>`__. Larger Blenderbot checkpoints should
instead be used with :class:`~transformers.BlenderbotModel` and
:class:`~transformers.BlenderbotForConditionalGeneration`
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Blender chatbot model was proposed in `Recipes for building an open-domain chatbot
<https://arxiv.org/pdf/2004.13637.pdf>`__ Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston on 30 Apr 2020.
The abstract of the paper is the following:
*Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that
scaling neural models in the number of parameters and the size of the data they are trained on gives improved results,
we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of
skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to
their partners, and displaying knowledge, empathy and personality appropriately, while maintaining a consistent
persona. We show that large scale models can learn these skills when given appropriate training data and choice of
generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter models, and make our models
and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
BlenderbotSmallConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotSmallConfig
:members:
BlenderbotSmallTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotSmallTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
BlenderbotSmallModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotSmallModel
:members: forward
BlenderbotSmallForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BlenderbotSmallForConditionalGeneration
:members: forward
TFBlenderbotSmallModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBlenderbotSmallModel
:members: call
TFBlenderbotSmallForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFBlenderbotSmallForConditionalGeneration
:members: call

View File

@ -0,0 +1,152 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
CamemBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The CamemBERT model was proposed in `CamemBERT: a Tasty French Language Model <https://arxiv.org/abs/1911.03894>`__ by
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model
trained on 138GB of French text.
The abstract from the paper is the following:
*Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available
models have either been trained on English data or on the concatenation of data in multiple languages. This makes
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French,
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging,
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
downstream applications for French NLP.*
Tips:
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
as well as the information relative to the inputs and outputs.
The original code can be found `here <https://camembert-model.fr/>`__.
CamembertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertConfig
:members:
CamembertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
CamembertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertTokenizerFast
:members:
CamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertModel
:members:
CamembertForCausalLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForCausalLM
:members:
CamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMaskedLM
:members:
CamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForSequenceClassification
:members:
CamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForMultipleChoice
:members:
CamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForTokenClassification
:members:
CamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CamembertForQuestionAnswering
:members:
TFCamembertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertModel
:members:
TFCamembertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMaskedLM
:members:
TFCamembertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForSequenceClassification
:members:
TFCamembertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForMultipleChoice
:members:
TFCamembertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForTokenClassification
:members:
TFCamembertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCamembertForQuestionAnswering
:members:

View File

@ -0,0 +1,104 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
CTRL
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CTRL model was proposed in `CTRL: A Conditional Transformer Language Model for Controllable Generation
<https://arxiv.org/abs/1909.05858>`_ by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and
Richard Socher. It's a causal (unidirectional) transformer pre-trained using language modeling on a very large corpus
of ~140 GB of text data with the first token reserved as a control code (such as Links, Books, Wikipedia etc.).
The abstract from the paper is the following:
*Large-scale language models show promising text generation capabilities, but users cannot easily control particular
aspects of the generated text. We release CTRL, a 1.63 billion-parameter conditional transformer language model,
trained to condition on control codes that govern style, content, and task-specific behavior. Control codes were
derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while
providing more explicit control over text generation. These codes also allow CTRL to predict which parts of the
training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data
via model-based source attribution.*
Tips:
- CTRL makes use of control codes to generate text: it requires generations to be started by certain words, sentences
or links to generate coherent text. Refer to the `original implementation <https://github.com/salesforce/ctrl>`__ for
more information.
- CTRL is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- CTRL was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows CTRL to generate syntactically coherent text as it can be
observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument.
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
CTRLConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLConfig
:members:
CTRLTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLTokenizer
:members: save_vocabulary
CTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLModel
:members: forward
CTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLLMHeadModel
:members: forward
CTRLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.CTRLForSequenceClassification
:members: forward
TFCTRLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLModel
:members: call
TFCTRLLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLLMHeadModel
:members: call
TFCTRLForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFCTRLForSequenceClassification
:members: call

View File

@ -0,0 +1,77 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DeBERTa
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DeBERTa model was proposed in `DeBERTa: Decoding-enhanced BERT with Disentangled Attention
<https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen It is based on Google's
BERT model released in 2018 and Facebook's RoBERTa model released in 2019.
It builds on RoBERTa with disentangled attention and enhanced mask decoder training with half of the data used in
RoBERTa.
The abstract from the paper is the following:
*Recent progress in pre-trained neural language models has significantly improved the performance of many natural
language processing (NLP) tasks. In this paper we propose a new model architecture DeBERTa (Decoding-enhanced BERT with
disentangled attention) that improves the BERT and RoBERTa models using two novel techniques. The first is the
disentangled attention mechanism, where each word is represented using two vectors that encode its content and
position, respectively, and the attention weights among words are computed using disentangled matrices on their
contents and relative positions. Second, an enhanced mask decoder is used to replace the output softmax layer to
predict the masked tokens for model pretraining. We show that these two techniques significantly improve the efficiency
of model pretraining and performance of downstream tasks. Compared to RoBERTa-Large, a DeBERTa model trained on half of
the training data performs consistently better on a wide range of NLP tasks, achieving improvements on MNLI by +0.9%
(90.2% vs. 91.1%), on SQuAD v2.0 by +2.3% (88.4% vs. 90.7%) and RACE by +3.6% (83.2% vs. 86.8%). The DeBERTa code and
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
DebertaConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaConfig
:members:
DebertaTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
DebertaModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaModel
:members:
DebertaPreTrainedModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaPreTrainedModel
:members:
DebertaForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DebertaForSequenceClassification
:members:

View File

@ -0,0 +1,54 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DialoGPT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DialoGPT was proposed in `DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
<https://arxiv.org/abs/1911.00536>`_ by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao,
Jianfeng Gao, Jingjing Liu, Bill Dolan. It's a GPT2 Model trained on 147M conversation-like exchanges extracted from
Reddit.
The abstract from the paper is the following:
*We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained
transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning
from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human
both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems
that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline
systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response
generation and the development of more intelligent open-domain dialogue systems.*
Tips:
- DialoGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
than the left.
- DialoGPT was trained with a causal language modeling (CLM) objective on conversational data and is therefore powerful
at response generation in open-domain dialogue systems.
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on `DialoGPT's model card
<https://huggingface.co/microsoft/DialoGPT-medium>`_.
Training:
In order to train or fine-tune DialoGPT, one can use causal language modeling training. To cite the official paper: *We
follow the OpenAI GPT-2 to model a multiturn dialogue session as a long text and frame the generation task as language
modeling. We first concatenate all dialog turns within a dialogue session into a long text x_1,..., x_N (N is the
sequence length), ended by the end-of-text token.* For more information please confer to the original paper.
DialoGPT's architecture is based on the GPT2 model, so one can refer to GPT2's `docstring
<https://huggingface.co/transformers/model_doc/gpt2.html>`_.
The original code can be found `here <https://github.com/microsoft/DialoGPT>`_.

View File

@ -1,70 +1,154 @@
DistilBERT
----------------------------------------------------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
``DistilBertConfig``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DistilBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The DistilBERT model was proposed in the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a
distilled version of BERT <https://medium.com/huggingface/distilbert-8cf3380435b5>`__, and the paper `DistilBERT, a
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__. DistilBERT is a
small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than
`bert-base-uncased`, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language
understanding benchmark.
The abstract from the paper is the following:
*As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP),
operating these large models in on-the-edge and/or under constrained computational training or inference budgets
remains challenging. In this work, we propose a method to pre-train a smaller general-purpose language representation
model, called DistilBERT, which can then be fine-tuned with good performances on a wide range of tasks like its larger
counterparts. While most prior work investigated the use of distillation for building task-specific models, we leverage
knowledge distillation during the pretraining phase and show that it is possible to reduce the size of a BERT model by
40%, while retaining 97% of its language understanding capabilities and being 60% faster. To leverage the inductive
biases learned by larger models during pretraining, we introduce a triple loss combining language modeling,
distillation and cosine-distance losses. Our smaller, faster and lighter model is cheaper to pre-train and we
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
study.*
Tips:
- DistilBERT doesn't have :obj:`token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`[SEP]`).
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
necessary though, just let us know if you need this option.
The original code can be found `here
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
DistilBertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertConfig
:members:
``DistilBertTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DistilBertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizer
:members:
``DistilBertModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizerFast
:members:
DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertModel
:members:
:members: forward
``DistilBertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMaskedLM
:members:
:members: forward
``DistilBertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForSequenceClassification
:members:
:members: forward
``DistilBertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
DistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForMultipleChoice
:members: forward
DistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForTokenClassification
:members: forward
DistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertForQuestionAnswering
:members:
:members: forward
``TFDistilBertModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFDistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertModel
:members:
.. autoclass:: transformers.TFDistilBertModel
:members: call
``TFDistilBertForMaskedLM``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFDistilBertForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForMaskedLM
:members:
.. autoclass:: transformers.TFDistilBertForMaskedLM
:members: call
``TFDistilBertForSequenceClassification``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFDistilBertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForSequenceClassification
:members:
.. autoclass:: transformers.TFDistilBertForSequenceClassification
:members: call
``TFDistilBertForQuestionAnswering``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFDistilBertForQuestionAnswering
:members:
TFDistilBertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForMultipleChoice
:members: call
TFDistilBertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForTokenClassification
:members: call
TFDistilBertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
:members: call

View File

@ -0,0 +1,132 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
DPR
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was
introduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
The abstract from the paper is the following:
*Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional
sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can
be practically implemented using dense representations alone, where embeddings are learned from a small number of
questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets,
our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
benchmarks.*
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
DPRConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRConfig
:members:
DPRContextEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizer
:members:
DPRContextEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoderTokenizerFast
:members:
DPRQuestionEncoderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizer
:members:
DPRQuestionEncoderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoderTokenizerFast
:members:
DPRReaderTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizer
:members:
DPRReaderTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReaderTokenizerFast
:members:
DPR specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRContextEncoderOutput
:members:
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRQuestionEncoderOutput
:members:
.. autoclass:: transformers.models.dpr.modeling_dpr.DPRReaderOutput
:members:
DPRContextEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRContextEncoder
:members: forward
DPRQuestionEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRQuestionEncoder
:members: forward
DPRReader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DPRReader
:members: forward
TFDPRContextEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDPRContextEncoder
:members: call
TFDPRQuestionEncoder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDPRQuestionEncoder
:members: call
TFDPRReader
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFDPRReader
:members: call

View File

@ -0,0 +1,186 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
ELECTRA
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The ELECTRA model was proposed in the paper `ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators <https://openreview.net/pdf?id=r1xMH1BtvB>`__. ELECTRA is a new pretraining approach which trains two
transformer models: the generator and the discriminator. The generator's role is to replace tokens in a sequence, and
is therefore trained as a masked language model. The discriminator, which is the model we're interested in, tries to
identify which tokens were replaced by the generator in the sequence.
The abstract from the paper is the following:
*Masked language modeling (MLM) pretraining methods such as BERT corrupt the input by replacing some tokens with [MASK]
and then train a model to reconstruct the original tokens. While they produce good results when transferred to
downstream NLP tasks, they generally require large amounts of compute to be effective. As an alternative, we propose a
more sample-efficient pretraining task called replaced token detection. Instead of masking the input, our approach
corrupts it by replacing some tokens with plausible alternatives sampled from a small generator network. Then, instead
of training a model that predicts the original identities of the corrupted tokens, we train a discriminative model that
predicts whether each token in the corrupted input was replaced by a generator sample or not. Thorough experiments
demonstrate this new pretraining task is more efficient than MLM because the task is defined over all input tokens
rather than just the small subset that was masked out. As a result, the contextual representations learned by our
approach substantially outperform the ones learned by BERT given the same model size, data, and compute. The gains are
particularly strong for small models; for example, we train a model on one GPU for 4 days that outperforms GPT (trained
using 30x more compute) on the GLUE natural language understanding benchmark. Our approach also works well at scale,
where it performs comparably to RoBERTa and XLNet while using less than 1/4 of their compute and outperforms them when
using the same amount of compute.*
Tips:
- ELECTRA is the pretraining approach, therefore there is nearly no changes done to the underlying model: BERT. The
only change is the separation of the embedding size and the hidden size: the embedding size is generally smaller,
while the hidden size is larger. An additional projection layer (linear) is used to project the embeddings from their
embedding size to the hidden size. In the case where the embedding size is the same as the hidden size, no projection
layer is used.
- The ELECTRA checkpoints saved using `Google Research's implementation <https://github.com/google-research/electra>`__
contain both the generator and discriminator. The conversion script requires the user to name which model to export
into the correct architecture. Once converted to the HuggingFace format, these checkpoints may be loaded into all
available ELECTRA models, however. This means that the discriminator may be loaded in the
:class:`~transformers.ElectraForMaskedLM` model, and the generator may be loaded in the
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
doesn't exist in the generator).
The original code can be found `here <https://github.com/google-research/electra>`__.
ElectraConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraConfig
:members:
ElectraTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizer
:members:
ElectraTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizerFast
:members:
Electra specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.electra.modeling_electra.ElectraForPreTrainingOutput
:members:
.. autoclass:: transformers.models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
:members:
ElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraModel
:members: forward
ElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForPreTraining
:members: forward
ElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMaskedLM
:members: forward
ElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForSequenceClassification
:members: forward
ElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForMultipleChoice
:members: forward
ElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForTokenClassification
:members: forward
ElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraForQuestionAnswering
:members: forward
TFElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraModel
:members: call
TFElectraForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForPreTraining
:members: call
TFElectraForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMaskedLM
:members: call
TFElectraForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForSequenceClassification
:members: call
TFElectraForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForMultipleChoice
:members: call
TFElectraForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForTokenClassification
:members: call
TFElectraForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFElectraForQuestionAnswering
:members: call

View File

@ -0,0 +1,42 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Encoder Decoder Models
-----------------------------------------------------------------------------------------------------------------------
The :class:`~transformers.EncoderDecoderModel` can be used to initialize a sequence-to-sequence model with any
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
The effectiveness of initializing sequence-to-sequence models with pretrained checkpoints for sequence generation tasks
was shown in `Leveraging Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by
Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
After such an :class:`~transformers.EncoderDecoderModel` has been trained/fine-tuned, it can be saved/loaded just like
any other models (see the examples for more information).
An application of this architecture could be to leverage two pretrained :class:`~transformers.BertModel` as the encoder
and decoder for a summarization model as was shown in: `Text Summarization with Pretrained Encoders
<https://arxiv.org/abs/1908.08345>`__ by Yang Liu and Mirella Lapata.
EncoderDecoderConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderConfig
:members:
EncoderDecoderModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EncoderDecoderModel
:members: forward, from_encoder_decoder_pretrained

View File

@ -0,0 +1,143 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
FlauBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The FlauBERT model was proposed in the paper `FlauBERT: Unsupervised Language Model Pre-training for French
<https://arxiv.org/abs/1912.05372>`__ by Hang Le et al. It's a transformer model pretrained using a masked language
modeling (MLM) objective (like BERT).
The abstract from the paper is the following:
*Language models have become a key step to achieve state-of-the art results in many different Natural Language
Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way
to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their
contextualization at the sentence level. This has been widely demonstrated for English using contextualized
representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al.,
2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and
heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for
Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text
classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the
time they outperform other pretraining approaches. Different versions of FlauBERT as well as a unified evaluation
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
community for further reproducible experiments in French NLP.*
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
FlaubertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertConfig
:members:
FlaubertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertTokenizer
:members:
FlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertModel
:members: forward
FlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertWithLMHeadModel
:members: forward
FlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForSequenceClassification
:members: forward
FlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForMultipleChoice
:members: forward
FlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForTokenClassification
:members: forward
FlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnsweringSimple
:members: forward
FlaubertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FlaubertForQuestionAnswering
:members: forward
TFFlaubertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertModel
:members: call
TFFlaubertWithLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertWithLMHeadModel
:members: call
TFFlaubertForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForSequenceClassification
:members: call
TFFlaubertForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForMultipleChoice
:members: call
TFFlaubertForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForTokenClassification
:members: call
TFFlaubertForQuestionAnsweringSimple
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFlaubertForQuestionAnsweringSimple
:members: call

View File

@ -0,0 +1,73 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
FSMT
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** If you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
@stas00.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FSMT (FairSeq MachineTranslation) models were introduced in `Facebook FAIR's WMT19 News Translation Task Submission
<https://arxiv.org/abs/1907.06616>`__ by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
The abstract of the paper is the following:
*This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two
language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from
last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling
toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes,
as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific
data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
Implementation Notes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- FSMT uses source and target vocabulary pairs that aren't combined into one. It doesn't share embeddings tokens
either. Its tokenizer is very similar to :class:`~transformers.XLMTokenizer` and the main model is derived from
:class:`~transformers.BartModel`.
FSMTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTConfig
:members:
FSMTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, prepare_seq2seq_batch, save_vocabulary
FSMTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTModel
:members: forward
FSMTForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FSMTForConditionalGeneration
:members: forward

View File

@ -0,0 +1,196 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Funnel Transformer
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Funnel Transformer model was proposed in the paper `Funnel-Transformer: Filtering out Sequential Redundancy for
Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__. It is a bidirectional transformer model, like
BERT, but with a pooling operation after each block of layers, a bit like in traditional convolutional neural networks
(CNN) in computer vision.
The abstract from the paper is the following:
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
comprehension.*
Tips:
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
sequence length as the input.
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should be
used for :class:`~transformers.FunnelModel`, :class:`~transformers.FunnelForPreTraining`,
:class:`~transformers.FunnelForMaskedLM`, :class:`~transformers.FunnelForTokenClassification` and
class:`~transformers.FunnelForQuestionAnswering`. The second ones should be used for
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
:class:`~transformers.FunnelForMultipleChoice`.
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
FunnelConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelConfig
:members:
FunnelTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
FunnelTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelTokenizerFast
:members:
Funnel specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.funnel.modeling_funnel.FunnelForPreTrainingOutput
:members:
.. autoclass:: transformers.models.funnel.modeling_tf_funnel.TFFunnelForPreTrainingOutput
:members:
FunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelBaseModel
:members: forward
FunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelModel
:members: forward
FunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForPreTraining
:members: forward
FunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMaskedLM
:members: forward
FunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForSequenceClassification
:members: forward
FunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForMultipleChoice
:members: forward
FunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForTokenClassification
:members: forward
FunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.FunnelForQuestionAnswering
:members: forward
TFFunnelBaseModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelBaseModel
:members: call
TFFunnelModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelModel
:members: call
TFFunnelModelForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForPreTraining
:members: call
TFFunnelForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMaskedLM
:members: call
TFFunnelForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForSequenceClassification
:members: call
TFFunnelForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForMultipleChoice
:members: call
TFFunnelForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForTokenClassification
:members: call
TFFunnelForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFFunnelForQuestionAnswering
:members: call

View File

@ -1,57 +1,146 @@
OpenAI GPT
----------------------------------------------------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
``OpenAIGPTConfig``
~~~~~~~~~~~~~~~~~~~~~
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
OpenAI GPT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
The abstract from the paper is the following:
*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pretraining of a
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
the state of the art in 9 out of the 12 tasks studied.*
Tips:
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
observed in the `run_generation.py` example script.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
showcasing the generative capabilities of several models. GPT is one of them.
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
Note:
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install ``ftfy``
and ``SpaCy``::
.. code-block:: bash
pip install spacy ftfy==4.4.3
python -m spacy download en
If you don't install ``ftfy`` and ``SpaCy``, the :class:`~transformers.OpenAIGPTTokenizer` will default to tokenize
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
OpenAIGPTConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTConfig
:members:
``OpenAIGPTTokenizer``
~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAIGPTTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizer
:members: save_vocabulary
OpenAIGPTTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizerFast
:members:
``OpenAIGPTModel``
~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.openai.modeling_openai.OpenAIGPTDoubleHeadsModelOutput
:members:
.. autoclass:: transformers.models.openai.modeling_tf_openai.TFOpenAIGPTDoubleHeadsModelOutput
:members:
OpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTModel
:members:
:members: forward
``OpenAIGPTLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTLMHeadModel
:members:
:members: forward
``OpenAIGPTDoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTDoubleHeadsModel
:members:
:members: forward
``TFOpenAIGPTModel``
~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAIGPTForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTModel
:members:
.. autoclass:: transformers.OpenAIGPTForSequenceClassification
:members: forward
``TFOpenAIGPTLMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFOpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTLMHeadModel
:members:
.. autoclass:: transformers.TFOpenAIGPTModel
:members: call
``TFOpenAIGPTDoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TFOpenAIGPTLMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFOpenAIGPTDoubleHeadsModel
:members:
.. autoclass:: transformers.TFOpenAIGPTLMHeadModel
:members: call
TFOpenAIGPTDoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTDoubleHeadsModel
:members: call
TFOpenAIGPTForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFOpenAIGPTForSequenceClassification
:members: call

View File

@ -1,57 +1,140 @@
OpenAI GPT2
----------------------------------------------------
..
Copyright 2020 The HuggingFace Team. All rights reserved.
``GPT2Config``
~~~~~~~~~~~~~~~~~~~~~
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
OpenAI GPT2
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_ by Alec
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
The abstract from the paper is the following:
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million
web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some
text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
10X the amount of data.*
Tips:
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
observed in the `run_generation.py` example script.
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
this argument.
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
GPT2Config
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Config
:members:
``GPT2Tokenizer``
~~~~~~~~~~~~~~~~~~~~~
GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Tokenizer
:members: save_vocabulary
GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2TokenizerFast
:members:
``GPT2Model``
~~~~~~~~~~~~~~~~~~~~~
GPT2 specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.gpt2.modeling_gpt2.GPT2DoubleHeadsModelOutput
:members:
.. autoclass:: transformers.models.gpt2.modeling_tf_gpt2.TFGPT2DoubleHeadsModelOutput
:members:
GPT2Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Model
:members:
:members: forward, parallelize, deparallelize
``GPT2LMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~
GPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2LMHeadModel
:members:
:members: forward, parallelize, deparallelize
``GPT2DoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
GPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2DoubleHeadsModel
:members:
``TFGPT2Model``
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2Model
:members:
``TFGPT2LMHeadModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2LMHeadModel
:members:
``TFGPT2DoubleHeadsModel``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: pytorch_transformers.TFGPT2DoubleHeadsModel
:members: forward
GPT2ForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2ForSequenceClassification
:members: forward
TFGPT2Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2Model
:members: call
TFGPT2LMHeadModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2LMHeadModel
:members: call
TFGPT2DoubleHeadsModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2DoubleHeadsModel
:members: call
TFGPT2ForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFGPT2ForSequenceClassification
:members: call
TFSequenceClassifierOutputWithPast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_outputs.TFSequenceClassifierOutputWithPast
:members:

View File

@ -0,0 +1,71 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
herBERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The herBERT model was proposed in `KLEJ: Comprehensive Benchmark for Polish Language Understanding
<https://www.aclweb.org/anthology/2020.acl-main.111.pdf>`__ by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and
Ireneusz Gawlik. It is a BERT-based Language Model trained on Polish Corpora using only MLM objective with dynamic
masking of whole words.
The abstract from the paper is the following:
*In recent years, a series of Transformer-based models unlocked major improvements in general natural language
understanding (NLU) tasks. Such a fast pace of research would not be possible without general NLU benchmarks, which
allow for a fair comparison of the proposed methods. However, such benchmarks are available only for a handful of
languages. To alleviate this issue, we introduce a comprehensive multi-task benchmark for the Polish language
understanding, accompanied by an online leaderboard. It consists of a diverse set of tasks, adopted from existing
datasets for named entity recognition, question-answering, textual entailment, and others. We also introduce a new
sentiment analysis task for the e-commerce domain, named Allegro Reviews (AR). To ensure a common evaluation scheme and
promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and
applications. Additionally, we release HerBERT, a Transformer-based model trained specifically for the Polish language,
which has the best average performance and obtains the best results for three out of nine tasks. Finally, we provide an
extensive evaluation, including several standard baselines and recently proposed, multilingual Transformer-based
models.*
Examples of use:
.. code-block::
from transformers import HerbertTokenizer, RobertaModel
tokenizer = HerbertTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = RobertaModel.from_pretrained("allegro/herbert-klej-cased-v1")
encoded_input = tokenizer.encode("Kto ma lepszą sztukę, ma lepszy rząd to jasne.", return_tensors='pt')
outputs = model(encoded_input)
# HerBERT can also be loaded using AutoTokenizer and AutoModel:
import torch
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("allegro/herbert-klej-cased-tokenizer-v1")
model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
HerbertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HerbertTokenizer
:members:
HerbertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HerbertTokenizerFast
:members:

View File

@ -0,0 +1,132 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
LayoutLM
-----------------------------------------------------------------------------------------------------------------------
.. _Overview:
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LayoutLM model was proposed in the paper `LayoutLM: Pre-training of Text and Layout for Document Image
Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and
Ming Zhou. It's a simple but effective pretraining method of text and layout for document image understanding and
information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results
on several downstream tasks:
- form understanding: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a collection of 199 annotated
forms comprising more than 30,000 words).
- receipt understanding: the `SROIE <https://rrc.cvc.uab.es/?ch=13>`__ dataset (a collection of 626 receipts for
training and 347 receipts for testing).
- document image classification: the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset (a collection of
400,000 images belonging to one of 16 classes).
The abstract from the paper is the following:
*Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the
widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation,
while neglecting layout and style information that is vital for document image understanding. In this paper, we propose
the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is
beneficial for a great number of real-world document image understanding tasks such as information extraction from
scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM.
To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for
document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form
understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification
(from 93.07 to 94.42).*
Tips:
- In addition to `input_ids`, :meth:`~transformer.LayoutLMModel.forward` also expects the input :obj:`bbox`, which are
the bounding boxes (i.e. 2D-positions) of the input tokens. These can be obtained using an external OCR engine such
as Google's `Tesseract <https://github.com/tesseract-ocr/tesseract>`__ (there's a `Python wrapper
<https://pypi.org/project/pytesseract/>`__ available). Each bounding box should be in (x0, y0, x1, y1) format, where
(x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1) represents the
position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on a 0-1000
scale. To normalize, you can use the following function:
.. code-block::
def normalize_bbox(bbox, width, height):
return [
int(1000 * (bbox[0] / width)),
int(1000 * (bbox[1] / height)),
int(1000 * (bbox[2] / width)),
int(1000 * (bbox[3] / height)),
]
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:
.. code-block::
from PIL import Image
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
width, height = image.size
- For a demo which shows how to fine-tune :class:`LayoutLMForTokenClassification` on the `FUNSD dataset
<https://guillaumejaume.github.io/FUNSD/>`__ (a collection of annotated forms), see `this notebook
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
It includes an inference part, which shows how to use Google's Tesseract on a new document.
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
LayoutLMConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMConfig
:members:
LayoutLMTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMTokenizer
:members:
LayoutLMTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMTokenizerFast
:members:
LayoutLMModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMModel
:members:
LayoutLMForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForMaskedLM
:members:
LayoutLMForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForSequenceClassification
:members:
LayoutLMForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LayoutLMForTokenClassification
:members:

View File

@ -0,0 +1,149 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
LED
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LED model was proposed in `Longformer: The Long-Document Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz
Beltagy, Matthew E. Peters, Arman Cohan.
The abstract from the paper is the following:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting
long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization
dataset.*
Tips:
- :class:`~transformers.LEDForConditionalGeneration` is an extension of
:class:`~transformers.BartForConditionalGeneration` exchanging the traditional *self-attention* layer with
*Longformer*'s *chunked self-attention* layer. :class:`~transformers.LEDTokenizer` is an alias of
:class:`~transformers.BartTokenizer`.
- LED works very well on long-range *sequence-to-sequence* tasks where the ``input_ids`` largely exceed a length of
1024 tokens.
- LED pads the ``input_ids`` to be a multiple of ``config.attention_window`` if required. Therefore a small speed-up is
gained, when :class:`~transformers.LEDTokenizer` is used with the ``pad_to_multiple_of`` argument.
- LED makes use of *global attention* by means of the ``global_attention_mask`` (see
:class:`~transformers.LongformerModel`). For summarization, it is advised to put *global attention* only on the first
``<s>`` token. For question answering, it is advised to put *global attention* on all tokens of the question.
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by setting
``config.gradient_checkpointing = True``.
- A notebook showing how to evaluate LED, can be accessed `here
<https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing>`__.
- A notebook showing how to fine-tune LED, can be accessed `here
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
LEDConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDConfig
:members:
LEDTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
LEDTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDTokenizerFast
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
LED specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.led.modeling_led.LEDEncoderBaseModelOutput
:members:
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqModelOutput
:members:
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqLMOutput
:members:
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqSequenceClassifierOutput
:members:
.. autoclass:: transformers.models.led.modeling_led.LEDSeq2SeqQuestionAnsweringModelOutput
:members:
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDEncoderBaseModelOutput
:members:
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDSeq2SeqModelOutput
:members:
.. autoclass:: transformers.models.led.modeling_tf_led.TFLEDSeq2SeqLMOutput
:members:
LEDModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDModel
:members: forward
LEDForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDForConditionalGeneration
:members: forward
LEDForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDForSequenceClassification
:members: forward
LEDForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LEDForQuestionAnswering
:members: forward
TFLEDModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLEDModel
:members: call
TFLEDForConditionalGeneration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLEDForConditionalGeneration
:members: call

View File

@ -0,0 +1,238 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Longformer
-----------------------------------------------------------------------------------------------------------------------
**DISCLAIMER:** This model is still a work in progress, if you see something strange, file a `Github Issue
<https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__.
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Longformer model was presented in `Longformer: The Long-Document Transformer
<https://arxiv.org/pdf/2004.05150.pdf>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
The abstract from the paper is the following:
*Transformer-based models are unable to process long sequences due to their self-attention operation, which scales
quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention
mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or
longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local
windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we
evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In
contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our
pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on
WikiHop and TriviaQA.*
Tips:
- Since the Longformer is based on RoBERTa, it doesn't have :obj:`token_type_ids`. You don't need to indicate which
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
:obj:`</s>`).
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
Longformer Self Attention
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Longformer self attention employs self attention on both a "local" context and a "global" context. Most tokens only
attend "locally" to each other meaning that each token attends to its :math:`\frac{1}{2} w` previous tokens and
:math:`\frac{1}{2} w` succeding tokens with :math:`w` being the window length as defined in
:obj:`config.attention_window`. Note that :obj:`config.attention_window` can be of type :obj:`List` to define a
different :math:`w` for each layer. A selected few tokens attend "globally" to all other tokens, as it is
conventionally done for all tokens in :obj:`BertSelfAttention`.
Note that "locally" and "globally" attending tokens are projected by different query, key and value matrices. Also note
that every "locally" attending token not only attends to tokens within its window :math:`w`, but also to all "globally"
attending tokens so that global attention is *symmetric*.
The user can define which tokens attend "locally" and which tokens attend "globally" by setting the tensor
:obj:`global_attention_mask` at run-time appropriately. All Longformer models employ the following logic for
:obj:`global_attention_mask`:
- 0: the token attends "locally",
- 1: the token attends "globally".
For more information please also refer to :meth:`~transformers.LongformerModel.forward` method.
Using Longformer self attention, the memory and time complexity of the query-key matmul operation, which usually
represents the memory and time bottleneck, can be reduced from :math:`\mathcal{O}(n_s \times n_s)` to
:math:`\mathcal{O}(n_s \times w)`, with :math:`n_s` being the sequence length and :math:`w` being the average window
size. It is assumed that the number of "globally" attending tokens is insignificant as compared to the number of
"locally" attending tokens.
For more information, please refer to the official `paper <https://arxiv.org/pdf/2004.05150.pdf>`__.
Training
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~transformers.LongformerForMaskedLM` is trained the exact same way :class:`~transformers.RobertaForMaskedLM` is
trained and should be used as follows:
.. code-block::
input_ids = tokenizer.encode('This is a sentence from [MASK] training data', return_tensors='pt')
mlm_labels = tokenizer.encode('This is a sentence from the training data', return_tensors='pt')
loss = model(input_ids, labels=input_ids, masked_lm_labels=mlm_labels)[0]
LongformerConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerConfig
:members:
LongformerTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizer
:members:
LongformerTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerTokenizerFast
:members:
Longformer specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerBaseModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerBaseModelOutputWithPooling
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerMaskedLMOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerQuestionAnsweringModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerSequenceClassifierOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerMultipleChoiceModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_longformer.LongformerTokenClassifierOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerBaseModelOutputWithPooling
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerMaskedLMOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerQuestionAnsweringModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerSequenceClassifierOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerMultipleChoiceModelOutput
:members:
.. autoclass:: transformers.models.longformer.modeling_tf_longformer.TFLongformerTokenClassifierOutput
:members:
LongformerModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerModel
:members: forward
LongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMaskedLM
:members: forward
LongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForSequenceClassification
:members: forward
LongformerForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForMultipleChoice
:members: forward
LongformerForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForTokenClassification
:members: forward
LongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LongformerForQuestionAnswering
:members: forward
TFLongformerModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerModel
:members: call
TFLongformerForMaskedLM
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForMaskedLM
:members: call
TFLongformerForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForQuestionAnswering
:members: call
TFLongformerForSequenceClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForSequenceClassification
:members: call
TFLongformerForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForTokenClassification
:members: call
TFLongformerForMultipleChoice
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLongformerForMultipleChoice
:members: call

View File

@ -0,0 +1,127 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
LXMERT
-----------------------------------------------------------------------------------------------------------------------
Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LXMERT model was proposed in `LXMERT: Learning Cross-Modality Encoder Representations from Transformers
<https://arxiv.org/abs/1908.07490>`__ by Hao Tan & Mohit Bansal. It is a series of bidirectional transformer encoders
(one for the vision modality, one for the language modality, and then one to fuse both modalities) pretrained using a
combination of masked language modeling, visual-language text alignment, ROI-feature regression, masked
visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. The pretraining
consists of multiple multi-modal datasets: MSCOCO, Visual-Genome + Visual-Genome Question Answering, VQA 2.0, and GQA.
The abstract from the paper is the following:
*Vision-and-language reasoning requires an understanding of visual concepts, language semantics, and, most importantly,
the alignment and relationships between these two modalities. We thus propose the LXMERT (Learning Cross-Modality
Encoder Representations from Transformers) framework to learn these vision-and-language connections. In LXMERT, we
build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language
encoder, and a cross-modality encoder. Next, to endow our model with the capability of connecting vision and language
semantics, we pre-train the model with large amounts of image-and-sentence pairs, via five diverse representative
pretraining tasks: masked language modeling, masked object prediction (feature regression and label classification),
cross-modality matching, and image question answering. These tasks help in learning both intra-modality and
cross-modality relationships. After fine-tuning from our pretrained parameters, our model achieves the state-of-the-art
results on two visual question answering datasets (i.e., VQA and GQA). We also show the generalizability of our
pretrained cross-modality model by adapting it to a challenging visual-reasoning task, NLVR, and improve the previous
best result by 22% absolute (54% to 76%). Lastly, we demonstrate detailed ablation studies to prove that both our novel
model components and pretraining strategies significantly contribute to our strong results; and also present several
attention visualizations for the different encoders*
Tips:
- Bounding boxes are not necessary to be used in the visual feature embeddings, any kind of visual-spacial features
will work.
- Both the language hidden states and the visual hidden states that LXMERT outputs are passed through the
cross-modality layer, so they contain information from both modalities. To access a modality that only attends to
itself, select the vision/language hidden states from the first input in the tuple.
- The bidirectional cross-modality encoder attention only returns attention values when the language modality is used
as the input and the vision modality is used as the context vector. Further, while the cross-modality encoder
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
both self attention outputs are disregarded.
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
LxmertConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertConfig
:members:
LxmertTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizer
:members:
LxmertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertTokenizerFast
:members:
Lxmert specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertModelOutput
:members:
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertForPreTrainingOutput
:members:
.. autoclass:: transformers.models.lxmert.modeling_lxmert.LxmertForQuestionAnsweringOutput
:members:
.. autoclass:: transformers.models.lxmert.modeling_tf_lxmert.TFLxmertModelOutput
:members:
.. autoclass:: transformers.models.lxmert.modeling_tf_lxmert.TFLxmertForPreTrainingOutput
:members:
LxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertModel
:members: forward
LxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForPreTraining
:members: forward
LxmertForQuestionAnswering
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.LxmertForQuestionAnswering
:members: forward
TFLxmertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertModel
:members: call
TFLxmertForPreTraining
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TFLxmertForPreTraining
:members: call

Some files were not shown because too many files have changed in this diff Show More