Compare commits

..

144 Commits

Author SHA1 Message Date
c06d0adad0 fix 2023-11-16 11:06:20 +01:00
0092591084 fix 2023-11-16 11:04:54 +01:00
127cfd6e09 fix 2023-11-16 11:04:54 +01:00
ded287fe63 fix 2023-11-16 11:04:54 +01:00
17b8ba5729 Build AMD image (changed version) 2023-11-16 11:04:54 +01:00
3559d45912 Build AMD image (changed version) 2023-11-16 11:04:54 +01:00
fbb5f5bfb1 Build AMD image (changed version) 2023-11-16 11:04:54 +01:00
2891a9fc22 Build AMD image (changed version) 2023-11-16 11:04:53 +01:00
6ee5b62f67 Build AMD image (changed version) 2023-11-16 11:04:53 +01:00
01392a31b3 Build AMD image (changed version) 2023-11-16 11:04:53 +01:00
c35eda1da2 Build AMD image (changed version) 2023-11-16 11:04:53 +01:00
d87e101a81 try build docker images on Kube 2023-11-16 11:04:53 +01:00
1394e08cf0 Support ONNX export for causal LM sequence classifiers (#27450)
support onnx for causal lm sequence classification
2023-11-16 18:56:34 +09:00
06343b0633 translate model.md to chinese (#27518)
* translate model.md to chinese

* apply review suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-15 16:59:03 -08:00
1ac599d90f Fix offload disk for loading derivated model checkpoint into base model (#27253)
* fix

* style

* add test
2023-11-15 14:58:08 -05:00
b71c38a094 Fix bug for T5x to PyTorch convert script with varying encoder and decoder layers (#27448)
* Fix bug in handling varying encoder and decoder layers

This commit resolves an issue where the script failed to convert T5x models to PyTorch models when the number of decoder layers differed from the number of encoder layers.  I've addressed this issue by passing an additional 'num_decoder_layers' parameter to the relevant function.

* Fix bug in handling varying encoder and decoder layers
2023-11-15 19:00:22 +00:00
2e72bbab2c Incorrect setting for num_beams in translation and summarization examples (#27519)
* Remove the torch main_process_first context manager from TF examples

* Correctly set num_beams=1 in our examples, and add a guard in GenerationConfig.validate()

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-15 18:18:54 +00:00
e6522e49a7 Fixing the failure of models without max_position_embeddings attribute. (#27499)
fix max pos issue

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-11-15 18:16:42 +00:00
a0633c4483 Translating en/model_doc docs to Japanese. (#27401)
* update _toctree.yml & add albert-autoformer

* Fixed typo in docs/source/ja/model_doc/audio-spectrogram-transformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Delete duplicated sentence docs/source/ja/model_doc/autoformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Reflect reviews

* delete untranslated models from toctree

* delete all comments

* add abstract translation

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-15 10:13:52 -08:00
a85ea4b19a Fix wav2vec2 params (#27515)
Fix test
2023-11-15 09:24:03 -05:00
48ba1e074f [ PretrainedConfig] Improve messaging (#27438)
* import hf error

* nits

* fixup

* catch the error at the correct place

* style

* improve message a tiny bit

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* add a test

---------

Co-authored-by: Lucain <lucainp@gmail.com>
2023-11-15 14:10:39 +01:00
453079c7f8 🚨🚨 Fix beam score calculation issue for decoder-only models (#27351)
* Fix beam score calculation issue for decoder-only models

* Update beam search test and fix code quality issue

* Fix beam_sample, group_beam_search and constrained_beam_search

* Split test for pytorch and TF, add documentation

---------

Co-authored-by: Xin Qiu <xin.qiu@sentient.ai>
2023-11-15 12:49:14 +00:00
3d1a7bf476 [tokenizers] update tokenizers version pin (#27494)
* update `tokenizers` version pin

* force tokenizers>=0.15

* use  0.14

Co-authored-by: Lysandre <lysandre@huggingface.co>

---------

Co-authored-by: Lysandre <lysandre@huggingface.co>
2023-11-15 10:46:02 +01:00
64e21ca2a4 Make some jobs run on the GitHub Actions runners (#27512)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-15 10:43:16 +01:00
1e0e2dd376 [CircleCI] skip test_assisted_decoding_sample for everyone (#27511)
* skip 4 tests

* nits

* style

* wow it's not my day

* skip new failing tests

* style

* skip for NLLB MoE as well

* skip `test_assisted_decoding_sample` for everyone
2023-11-15 10:17:51 +01:00
7ddb21b4db Update spelling mistake (#27506)
thoroughly was misspelled thouroughly
2023-11-15 09:50:45 +01:00
72f531ab6b [Table Transformer] Add Transformers-native checkpoints (#26928)
* Improve conversion scripts

* Fix paths

* Fix style
2023-11-15 09:35:53 +01:00
cc0dc24bc9 [Fuyu] Add tests (#27001)
* Add tests

* Add integration test

* More improvements

* Fix tests

* Fix style

* Skip gradient checkpointing tests

* Update script

* Remove scripts

* Remove Fuyu from auto mapping

* Fix integration test

* More improvements

* Remove file

* Add Fuyu to slow documentation tests

* Address comments

* Clarify comment
2023-11-15 09:33:04 +01:00
186c077513 [CI-test_torch] skip test_tf_from_pt_safetensors and test_assisted_decoding_sample (#27508)
* skip 4 tests

* nits

* style

* wow it's not my day

* skip new failing tests

* style

* skip for NLLB MoE as well
2023-11-15 08:39:29 +01:00
2fc33ebead Track the number of tokens seen to metrics (#27274)
* Add tokens seen

* Address comments, add to TrainingArgs

* Update log

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Use self.args

* Fix docstring

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-14 15:31:04 -05:00
303c1d69f3 Update processor mapping for hub snippets (#27477) 2023-11-14 20:05:54 +00:00
067c4a310d Have seq2seq just use gather (#27025)
* Have seq2seq just use gather

* Change

* Reset after

* Make slow

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Clean

* Simplify and just use gather

* Update tests/trainer/test_trainer_seq2seq.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* gather always for seq2seq

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-14 14:54:44 -05:00
250032e974 Minor type annotation fix (#27276)
* Minor type annotation fix

* Trigger Build
2023-11-14 19:09:21 +00:00
a53a0c5159 Generate: GenerationConfig.from_pretrained can return unused kwargs (#27488) 2023-11-14 18:40:57 +00:00
5468ab3555 Update and reorder docs for chat templates (#27443)
* Update and reorder docs for chat templates

* Fix Mistral docstring

* Add section link and small fixes

* Remove unneeded line in Mistral example

* Add comment on saving memory

* Fix generation prompts linl

* Fix code block languages
2023-11-14 18:26:13 +00:00
fe472b1db4 Generate: fix ExponentialDecayLengthPenalty doctest (#27485)
fix exponential doctest
2023-11-14 18:21:50 +00:00
73bc0c9e88 translate hpo_train.md and perf_hardware.md to chinese (#27431)
* translate

* translate

* update
2023-11-14 09:57:17 -08:00
78f6ed6c70 Revert "[time series] Add PatchTST (#25927)" (#27486)
The model was merged before final review and approval.

This reverts commit 2ac5b9325ed3b54950c6c61fd5838ac6e55a9fe1.
2023-11-14 12:24:00 +00:00
a4616c6767 [Whisper] Fix pipeline test (#27442) 2023-11-14 11:18:26 +00:00
b86c54d9ff Clap processor: remove wasteful np.stack operations (#27454)
remove wasteful np.stack

Np.stack on large 1-D tensor, causing ~0.5s processing time on short audio (<10s). Compared to 0.02s for medium length audio
2023-11-14 10:41:12 +00:00
4309abedbc Add speecht5 batch generation and fix wrong attention mask when padding (#25943)
* fix speecht5 wrong attention mask when padding

* enable batch generation and add parameter attention_mask

* fix doc

* fix format

* batch postnet inputs, return batched lengths, and consistent to old api

* fix format

* fix format

* fix the format

* fix doc-builder error

* add test, cross attention and docstring

* optimize code based on reviews

* docbuild

* refine

* not skip slow test

* add consistent dropout for batching

* loose atol

* add another test regarding to the consistency of vocoder

* fix format

* refactor

* add return_concrete_lengths as parameter for consistency w/wo batching

* fix review issues

* fix cross_attention issue
2023-11-14 09:54:09 +00:00
ee4fb326c7 Fix M4T weights tying (#27395)
fix seamless m4t weights tying
2023-11-14 09:52:11 +00:00
e107ae364e [CI-test_torch] skip test_tf_from_pt_safetensors for 4 models (#27481)
* skip 4 tests

* nits

* style

* wow it's not my day
2023-11-14 10:34:03 +01:00
d71fa9f618 [Peft] modules_to_save support for peft integration (#27466)
* `modules_to_save` support for peft integration

* Update docs/source/en/peft.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* slightly elaborate test

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-14 10:32:57 +01:00
721d1c8ca6 Fix FA2 import + deprecation cycle (#27330)
* put back import

* switch to logger.warnings instead
2023-11-14 09:20:29 +00:00
2ac5b9325e [time series] Add PatchTST (#25927)
* Initial commit of PatchTST model classes

Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>

* Add PatchTSTForPretraining

* update to include classification

Co-authored-by: Phanwadee Sinthong <phsinthong@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>

* clean up auto files

* Add PatchTSTForPrediction

* Fix relative import

* Replace original PatchTSTEncoder with ChannelAttentionPatchTSTEncoder

* temporary adding absolute path + add PatchTSTForForecasting class

* Update base PatchTSTModel + Unittest

* Update ForecastHead to use the config class

* edit cv_random_masking, add mask to model output

* Update configuration_patchtst.py

* add masked_loss to the pretraining

* add PatchEmbeddings

* Update configuration_patchtst.py

* edit loss which considers mask in the pretraining

* remove patch_last option

* Add commits from internal repo

* Update ForecastHead

* Add model weight initilization + unittest

* Update PatchTST unittest to use local import

* PatchTST integration tests for pretraining and prediction

* Added PatchTSTForRegression + update unittest to include label generation

* Revert unrelated model test file

* Combine similar output classes

* update PredictionHead

* Update configuration_patchtst.py

* Add Revin

* small edit to PatchTSTModelOutputWithNoAttention

* Update modeling_patchtst.py

* Updating integration test for forecasting

* Fix unittest after class structure changed

* docstring updates

* change input_size to num_input_channels

* more formatting

* Remove some unused params

* Add a comment for pretrained models

* add channel_attention option

add channel_attention option and remove unused positional encoders.

* Update PatchTST models to use HF's MultiHeadAttention module

* Update paper + github urls

* Fix hidden_state return value

* Update integration test to use PatchTSTForForecasting

* Adding dataclass decorator for model output classes

* Run fixup script

* Rename model repos for integration test

* edit argument explanation

* change individual option to shared_projection

* style

* Rename integration test + import cleanup

* Fix outpu_hidden_states return value

* removed unused mode

* added std, mean and nops scaler

* add initial distributional loss for predition

* fix typo in docs

* add generate function

* formatting

* add num_parallel_samples

* Fix a typo

* copy weighted_average function, edit PredictionHead

* edit PredictionHead

* add distribution head to forecasting

* formatting

* Add generate function for forecasting

* Add generate function to prediction task

* formatting

* use argsort

* add past_observed_mask ordering

* fix arguments

* docs

* add back test_model_outputs_equivalence test

* formatting

* cleanup

* formatting

* use ACT2CLS

* formatting

* fix add_start_docstrings decorator

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* fix typos

* add forecast_masking

* fixed tests

* use set_seed

* fix doc test

* formatting

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* better var names

* rename PatchTSTTranspose

* fix argument names and docs string

* remove compute_num_patches and unused class

* remove assert

* renamed to PatchTSTMasking

* use num_labels for classification

* use num_labels

* use default num_labels from super class

* move model_type after docstring

* renamed PatchTSTForMaskPretraining

* bs -> batch_size

* more review fixes

* use hidden_state

* rename encoder layer and block class

* remove commented seed_number

* edit docstring

* Add docstring

* formatting

* use past_observed_mask

* doc suggestion

* make fix-copies

* use Args:

* add docstring

* add docstring

* change some variable names and add PatchTST before some class names

* formatting

* fix argument types

* fix tests

* change x variable to patch_input

* format

* formatting

* fix-copies

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move loss to forward

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* formatting

* fix a bug when pre_norm is set to True

* output_hidden_states is set to False as default

* set pre_norm=True as default

* format docstring

* format

* output_hidden_states is None by default

* add missing docs

* better var names

* docstring: remove default to False in output_hidden_states

* change labels name to target_values in regression task

* format

* fix tests

* change to forecast_mask_ratios and random_mask_ratio

* change mask names

* change future_values to target_values param in the prediction class

* remove nn.Sequential and make PatchTSTBatchNorm class

* black

* fix argument name for prediction

* add output_attentions option

* add output_attentions to PatchTSTEncoder

* formatting

* Add attention output option to all classes

* Remove PatchTSTEncoderBlock

* create PatchTSTEmbedding class

* use config in PatchTSTPatchify

* Use config in PatchTSTMasking class

* add channel_attn_weights

* Add PatchTSTScaler class

* add output_attentions arg to test function

* format

* Update doc with image patchtst.md

* fix-copies

* rename Forecast <-> Prediction

* change name of a few parameters to match with PatchTSMixer.

* Remove *ForForecasting class to match with other time series models.

* make style

* Remove PatchTSTForForecasting in the test

* remove PatchTSTForForecastingOutput class

* change test_forecast_head to test_prediction_head

* style

* fix docs

* fix tests

* change num_labels to num_targets

* Remove PatchTSTTranspose

* remove arguments in PatchTSTMeanScaler

* remove arguments in PatchTSTStdScaler

* add config as an argument to all the scaler classes

* reformat

* Add norm_eps for batchnorm and layernorm

* reformat.

* reformat

* edit docstring

* update docstring

* change variable name pooling to pooling_type

* fix output_hidden_states as tuple

* fix bug when calling PatchTSTBatchNorm

* change stride to patch_stride

* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder

* formatting

* initialize scalers with configs

* edit output_hidden_states

* style

* fix forecast_mask_patches doc string

---------

Co-authored-by: Gift Sinthong <gift.sinthong@ibm.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Vijay Ekambaram <vijaykr.e@gmail.com>
Co-authored-by: Ngoc Diep Do <55230119+diepi@users.noreply.github.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: Ngoc Diep Do <diiepy@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-11-13 19:06:32 +01:00
8017a59091 Fixed typo in pipelines.md documentation (#27455)
Update pipelines.md
2023-11-13 17:50:40 +00:00
eb79b55bf3 Perf torch compile (#27422)
* translate perrf_torch_compile.md

* translate tf_xla.md

* update
2023-11-13 09:46:40 -08:00
7b139023c3 [AWQ ] Addresses TODO for awq tests (#27467)
addresses todo for awq tests
2023-11-13 18:18:41 +01:00
04af4b90d6 Fix Falcon tokenizer loading in pipeline (#27316)
* Improve pipeline tokenizer loading and hope nothing breaks

* Let's try a hacky solution

* Revert the changes to init

* Add a falcon hack to the automapping

* Add a falcon hack to the automapping
2023-11-13 17:01:59 +00:00
1af766e104 Add version check for Jinja (#27403)
* Add version check for Jinja

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-13 17:01:30 +00:00
2422c38de6 Add DINOv2 depth estimation (#26092)
* First draft

* Fix style

* More improvements

* Fix tests

* Fix tests

* Convert checkpoint

* Improve DPTImageProcessor

* Remove scripts, improve conversion script

* Remove print statements

* Fix test

* Improve docstring

* More improvements

* Fix style

* Fix image processor

* Add tests

* Address comments

* Address comments

* Make bias backwards compatible

* Address comment

* Address comment

* Address comment

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Address comments

* Add flag

* Add tests

* Make tests smaller

* Use regular BackboneOutput

* Fix all tests

* Update test

* Convert more checkpoints

* Convert giant checkpoints, add integration test

* Rename size_divisibility to size_divisor

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-13 16:20:42 +00:00
3b59621310 Install python-Levenshtein for nougat in CI image (#27465)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-13 16:38:13 +01:00
2dc29cfc98 Fix docstring for gradient_checkpointing_kwargs (#27470)
Docstring entry for `gradient_checkpointing_kwargs` was
`gradient_checkpointing_args`. This is incorrect.
2023-11-13 15:32:03 +00:00
20abdacbef OWLv2: bug fix in post_process_object_detection() when using cuda device (#27468)
* OWLv2: bug fix in post_process_object_detection() when using cuda device

* fix copies issue by fixing original function in owlvit
2023-11-13 15:31:44 +00:00
68ae3be7f5 Fix from_pt flag when loading with safetensors (#27394)
* Fix

* Tests

* Fix
2023-11-13 15:18:19 +01:00
9dc8fe1b32 Default to msgpack for safetensors (#27460)
* Default to msgpack for safetensors

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-13 15:17:01 +01:00
210e38d83f [Llama + Mistral] Add attention dropout (#27315)
* add droppouts

* add the dropout

* add doc in the config

* nits

* fix mistral config

* nits
2023-11-13 14:51:48 +01:00
b97cab7e6d Remove-auth-token (#27060)
* don't use `use_auth_token`internally

* let's use token everywhere

* fixup
2023-11-13 14:20:54 +01:00
8f577dca4f Fixed typo in error message (#27461)
"past key much have a shape" -> "past key must have a shape"
2023-11-13 11:43:01 +00:00
7b998cabee Fix some Wav2Vec2 related models' doctest (#27462)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-13 12:37:46 +01:00
9d87cd2ce2 Fix line ending in utils/not_doctested.txt (#27459)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-13 12:35:51 +01:00
7ee995fd9c Make examples_torch_job faster (#27437)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-10 20:05:05 +01:00
ed115b3473 Normalize floating point cast (#27249)
* Normalize image - cast input images to float32.

This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 becuase of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0.

The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too.

* Add tests

* Remove float32 cast
2023-11-10 15:35:27 +00:00
e1c3ac2551 Add Phi-1 and Phi-1_5 (#26170)
* only dir not even init

* init

* tokenizer removed and reference of codegen added

* modeling file updated a lot remaining app_rotary_emb

* conversion script done

* conversion script fixed, a lot of factoring done and most tests pass

* added token_clf and extractive_QA_head

* integration tests pass

* flash attn tests pass!

* config done

* more docs in modeling file

* some style fix

* style and others

* doc test error fix

* more doc fix

* some attention fixes

* most fixes

* style and other fixes

* docs fix and config

* doc fix

* some comments

* conversion script updated

* conversion script updated

* Revert "conversion script updated"

This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f.

* final comments

* add Phi to language_modeling.md

* edit phi.md file

* rebase and fix

* removed phi-1.5 example

* changed model_type from 'phi'->'mixformer-sequential'

* small change

* small change

* revert \small change

* changed mixformer-sequential->phi

* small change

* added phi-1.5 example instead of phi-1

* doc test might pass now

* rebase and small change

* added the dropout layer

* more fixes

* modified .md file

* very very small doc change
2023-11-10 15:28:30 +00:00
00dc856233 At most 2 GPUs for CI (#27435)
At most 2 GPUs

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-10 16:19:06 +01:00
68afca3e69 [AttentionMaskConverter] ]Fix-mask-inf (#27114)
* fix?

* actual fix

* fixups

* add dataclass to the attention mask converter

* refine testing suite

* make sure there are no overflows

* update the test
2023-11-10 15:22:43 +01:00
7e9f10ac94 Add CLVP (#24745)
* init commit

* attention arch done except rotary emb

* rotary emb done

* text encoder working

* outputs matching

* arch first pass done

* make commands done, tests and docs remaining

* all tests passed, only docs remaining

* docs done

* doc-builder fix

* convert script removed(not relevant)

* minor comments done

* added ckpt conversion script

* tokenizer done

* very minor fix of index.md 2

* mostly make fixup related

* all done except fe and rotary emb

* very small change

* removed unidecode dependency

* style changes

* tokenizer removed require_backends

* added require_inflect to tokenizer tests

* removed VOCAB_FILES in tokenizer test

* inflect dependency removed

* added rotary pos emb cache and simplified the apply method

* style

* little doc change

* more comments

* feature extractor added

* added processor

* auto-regressive config added

* added CLVPConditioningEncoder

* comments done except the test one

* weights added successfull(NOT tested)

* tokenizer fix with numbers

* generate outputs matching

* almost tests passing Integ tests not written

* Integ tests added

* major CUDA error fixed

* docs done

* rebase and multiple fixes

* fixed rebase overwrites

* generate code simplified and tests for AutoRegressive model added

* minor changes

* refectored gpt2 code in clvp file

* weights done and all code refactored

* mostly done except the fast_tokenizer

* doc test fix

* config file's doc fixes

* more config fix

* more comments

* tokenizer comments mostly done

* modeling file mostly refactored and can load modules

* ClvpEncoder tested

* ClvpDecoder, ClvpModel and ClvpForCausalLM tested

* integration and all tests passed

* more fixes

* docs almost done

* ckpt conversion refectored

* style and some failing tests fix

* comments

* temporary output fix but test_assisted_decoding_matches_greedy_search test fails

* majority changes done

* use_cache outputs same now! Along with the asisted_greedy_decoding test fix

* more comments

* more comments

* prepare_inputs_for_generation fixed and _prepare_model_inputs added

* style fix

* clvp.md change

* moved clvpconditionalencoder norms

* add model to new index

* added tokenizer input_ids_with_special_tokens

* small fix

* config mostly done

* added config-tester and changed conversion script

* more comments

* comments

* style fix

* some comments

* tokenizer changed back to prev state

* small commnets

* added output hidden states for the main model

* style fix

* comments

* small change

* revert small change

* .

* Update clvp.md

* Update test_modeling_clvp.py

* :)

* some minor change

* new fixes

* remove to_dict from FE
2023-11-10 13:49:10 +00:00
9dd58c53dd update Bark FA2 docs (#27400)
* update Bark FA2 docs

* update benchmark section

* Update bark.md

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* rephrase

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-11-10 13:40:30 +00:00
fd685cfd59 [Quantization] Add str to enum conversion for AWQ (#27320)
* add str to enum conversion

* fixup

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-10 13:45:00 +01:00
184f60dcec add attention_mask and position_ids in assisted model (#26892)
* add attention_mask and position_ids in assisted model

* fix bug

* fix attention mask

* fix attention_mask

* check assist inputs

* check assist input ids length

* fix assist model type

* set assist attention mask device
2023-11-10 11:05:15 +00:00
cf32c94135 Run all tests if circleci/create_circleci_config.py is modified (#27413)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 22:01:06 +01:00
740cd93590 Fix Owlv2 checkpoint name and a default value in Owlv2VisionConfig (#27402)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 21:39:03 +01:00
51a98c40ee remove failing tests and clean FE files (#27414)
* remove failing tests and clean FE files

* remove same similar text from tvlt
2023-11-09 18:35:42 +00:00
e38348ae8f Fix RequestCounter to make it more future-proof (#27406)
* Fix RequestCounter to make it more future-proof

* code quality
2023-11-09 18:53:26 +01:00
c8b6052ff6 Final fix of the accelerate installation issue (#27408)
* fix

* [test-all] commit

* fix

* [test-all] commit

* [test-all] commit

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 18:52:29 +01:00
c5037b459e Use editable install for git deps (#27404)
* Use editable install

* Full command
2023-11-09 10:20:12 -05:00
cf2a3f37bf Fix fuyu checkpoint repo in FuyuConfig (#27399)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 15:47:46 +01:00
3258ff9330 use pytest.mark directly (#27390)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 13:32:54 +01:00
791ec370d1 Adds dvclive callback (#27352)
* dvclive trainer callback

* style fixes

* dvclive link fixes
2023-11-09 12:19:31 +00:00
c5d7754b11 device-agnostic deepspeed testing (#27342) 2023-11-09 12:34:13 +01:00
9999b73968 Skip failing cache call tests (#27393)
* Skip failing cache call tests

* Fixup
2023-11-09 11:03:37 +00:00
bc086a2516 Put doctest options back to pyproject.toml (#27366)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-09 11:50:19 +01:00
e9adb0c9cf Change thresh in test (#27378)
Change thresh
2023-11-09 04:44:36 -05:00
085ea7e56c [CodeLlamaTokenizer] Nit, update __init__ to make sure the AddedTokens are not normalized because they are special (#27359)
* make sure tokens are properly initialized for codellama slow

* add m ore pretrained models

* style

* test more tokenizers checkpoints
2023-11-09 10:15:10 +01:00
7ecd229ba4 Smangrul/fix failing ds ci tests (#27358)
* fix failing DeepSpeed CI tests due to `safetensors` being default

* debug

* remove debug statements

* resolve comments

* Update test_deepspeed.py
2023-11-09 11:47:24 +05:30
ced9fd86f5 translate debugging.md to chinese (#27374)
* update

* update
2023-11-08 14:04:06 -08:00
0e402e1478 Update deprecated torch.range in test_modeling_ibert.py (#27355)
* Update deprecated torch.range

* Remove comment
2023-11-08 20:58:36 +01:00
a5bee89c9d Add Flash Attention 2 support to Bark (#27364)
* change handmade attention mask to _prepare_4d_attention_mask

* add flashattention2 support in Bark

* add flashattention2 tests on BarkSemanticModel

* make style

* fix flashattention and tests + make style

* fix memory leak and allow Bark to pass flash attention to sub-models

* make style

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove unecessary code from tests + justify overriding

* Update tests/models/bark/test_modeling_bark.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make style

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-08 17:06:35 +00:00
ef71673616 translate big_models.md and performance.md to chinese (#27334)
* translate performance.md

* tranlsate performance.md and big_models.md

* update translation

* update review
2023-11-08 08:48:46 -08:00
bd8f45b167 Fix tiny model script: not using from_pt=True (#27372)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-08 17:15:57 +01:00
7b175cfaa7 [Flax Whisper] large-v3 compatibility (#27360) 2023-11-08 15:11:38 +00:00
845aa832b7 Remove unused param from example script tests (#27354)
Unused param
2023-11-08 09:07:32 -05:00
eb30a49b20 Translate index.md to Turkish (#27093)
* Add index.md for tukish language

* Fix index.md (huggingface/transformers#27088)

* Add 'tr' to additional files

* Update docs/source/tr/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update index.md

---------

Co-authored-by: Mert Yanık <mert.yanik@lcwaikiki.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-08 08:35:20 -05:00
f16ff0f07e MusicGen Update (#27084)
* [MusicGen] Add stereo model

* safe serialization

* Update src/transformers/models/musicgen/modeling_musicgen.py

* split over 2 lines

* fix slow tests on cuda
2023-11-08 13:26:02 +00:00
5ef650b0ae Fix Kosmos-2 device issue (#27346)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-08 14:14:45 +01:00
efa57cb234 Fix example tests from failing (#27353)
* Fix example tests from failing

* CHange thresh
2023-11-08 07:45:21 -05:00
b6dbfee0a2 moving example of benchmarking to legacy dir (#27337)
move example of benchmarking to legacy
2023-11-08 09:27:37 +01:00
be74b2ead6 Add numpy alternative to FE using torchaudio (#26339)
* add audio_utils usage in the FE of SpeechToText

* clean unecessary parameters of AudioSpectrogramTransformer FE

* add audio_utils usage in AST

* add serialization tests and function to FEs

* make style

* remove use_torchaudio and move to_dict to FE

* test audio_utils usage

* make style and fix import (remove torchaudio dependency import)

* fix torch dependency for jax and tensor tests

* fix typo

* clean tests with suggestions

* add lines to test if is_speech_availble is False
2023-11-08 07:39:37 +00:00
e264745051 translate model_sharing.md and llm_tutorial.md to chinese (#27283)
* translate model_sharing.md

* translate llm_tutorial.md to chiense

* update wrong translation

* update _torctree.yml

* update typos

* update
2023-11-07 15:34:33 -08:00
f213d5dd8c translate the en tokenizer_summary.md to Chinese (#27291)
* translate the en tokenizer_summary.md to Chinese

* revise WordPiece

* add to source/zh/_toctree.yml
2023-11-07 15:31:51 -08:00
7e1eff7600 Allow scheduler parameters (#26480)
* Allow for scheduler kwargs

* Formatting

* Arguments checks, passing the tests

* Black failed somehow

---------

Co-authored-by: Pierre <pierre@avatarin.com>
2023-11-07 21:40:00 +00:00
ac5d4cf6de FIx Bark batching feature (#27271)
* fix bark batching

* make style

* add tests and make style
2023-11-07 18:32:00 +00:00
8f840edd31 [Whisper] Nit converting the tokenizer (#27349)
* `nospeech` instead of `nocaption` for the no speech token

* oups
2023-11-07 18:43:26 +01:00
cc9f27bb1e Remove padding_masks from gpt_bigcode. (#27348)
Update modeling_gpt_bigcode.py
2023-11-07 17:24:43 +00:00
8c91f15ae5 Resolve AttributeError by utilizing device calculation at the start of the forward function (#27347)
This commit addresses the 'NoneType' object AttributeError within the IdeficsModel forward function. Previously, the 'device' attribute was accessed directly from input_ids, resulting in a potential 'NoneType' error. Now, the device is properly calculated at the beginning of the forward function and utilized consistently throughout, ensuring the 'image_hidden_states' are derived from the correct device. This modification enables smoother processing and compatibility, ensuring the correct device attribution for 'image_encoder_embeddings' in the IdeficsModel forward pass.
2023-11-07 16:26:15 +00:00
Chi
9459d821d1 Remove a redundant variable. (#27288)
* Removed the redundant SiLUActivation class and now use nn.functional.silu directly.

* I apologize for adding torch.functional.silu. I have replaced it with nn.SiLU.

* Remove redundant variable in feature_extraction file
2023-11-07 15:57:48 +00:00
88832c01c8 [Whisper] Add conversion script for the tokenizer (#27338)
* draft

* updates

* full conversion taken from `https://gist.github.com/xenova/a452a6474428de0182b17605a98631ee`

* psuh

* nits

* updates

* more nits

* Add co author

Co-authored-by: Joshua Lochner <admin@xenova.com>

* fixup

* cleanup

* styling

* add proper path

* update

* nits

* don't  push the exit

* clean

* update whisper doc

* don't error out if tiktoken is not here

* make sure we are BC with conversion

* nit

* Update docs/source/en/model_doc/whisper.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* merge and update

* update markdwon

* Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

---------

Co-authored-by: Joshua Lochner <admin@xenova.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-07 15:07:55 +01:00
0ded281557 [FA2] Add flash attention for GPT-Neo (#26486)
* added flash attention for gpt-neo

* small change

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* readme updated

* .

* changes

* removed padding_mask

* Update src/transformers/models/gpt_neo/modeling_gpt_neo.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-07 13:54:01 +00:00
606d90845f Fix Whisper Conversion Script: Correct decoder_attention_heads and _download function (#26834)
* Fix error in convert_openai_to_hf.py: "_download() missing 1 required positional argument: root"

* Fix error in convert_openai_to_hf.py: "TypeError: byte indices must be integers or slices, not str"

* Fix decoder_attention_heads value in convert_openai_to_hf.py.

Correct the assignment for `decoder_attention_heads` in the conversion script for the Whisper model.

* Black reformat convert_openai_to_hf.py file.

* Fix Whisper model configuration defaults (for Tiny).

- Correct encoder/decoder layers and attention heads count.
- Update model width (`d_model`) to 384.

* Add docstring to the convert_openai_to_hf.py script with a doctest

* Add shebang and +x permission to the convert_openai_to_hf.py

* convert_openai_to_hf.py: reuse the read model_bytes in the _download() function

* Move convert_openai_to_hf.py doctest example to whisper.md

* whisper.md: Add an inference example to the Conversion section.

* whisper.md: remove `model.config.forced_decoder_ids` from examples (deprecated)

* whisper.md: Remove "## Format Conversion" section; not used by users

* whisper.md: Use librispeech_asr_dummy dataset and load_dataset()
2023-11-07 13:39:42 +01:00
90b4adc1f1 Generate: skip tests on unsupported models instead of passing (#27265) 2023-11-07 12:08:28 +00:00
26d8d5f211 Fix autoawq docker image (#27339)
* Update Dockerfile

* Update docker/transformers-all-latest-gpu/Dockerfile
2023-11-07 11:21:04 +01:00
da7ea9a4e3 [Whisper] Block language/task args for English-only (#27322)
* [Whisper] Block language/task args for English-only

* Update src/transformers/models/whisper/modeling_whisper.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-07 10:04:23 +00:00
9beb2737d7 [docs] fixed links with 404 (#27327)
* fixed links with 404

* make style
2023-11-06 19:45:03 +00:00
1b20e2bb42 Fix Kosmos2Processor batch mode (#27323)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-06 19:05:50 +01:00
a6e0d5a219 Fix VideoMAEforPretrained dtype error (#27296)
* Fix dtype error

* Fix mean and std dtype

* make style
2023-11-06 17:20:06 +00:00
e9dbd39263 Update sequence_classification.md (#27281)
I'm adding accelerate as one of the libraries to install because otherwise when running the Trainer, the model errorr out with the error. 

ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.20.1`: Please run `pip install transformers[torch]` or `pip install accelerate -U`

Further context: 
1. I've tried this across different environments so I believe that the environment is not the issue. 
2. I had the latest transformers library version running. 
3. Typically even after install accelerate and import it, it wouldn't resolve the issue until I restart the notebook and try again.
2023-11-06 14:21:48 +00:00
147f774671 [PretrainedTokenizer] add some of the most important functions to the doc (#27313) 2023-11-06 15:11:00 +01:00
1ffc4dee5b enable memory tracker metrics for npu (#27280) 2023-11-06 13:44:21 +00:00
d7dcfa8917 Remove an unexpected argument for FlaxResNetBasicLayerCollection (#27272)
Remove unexpected argument for FlaxResNetBasicLayerCollection
2023-11-06 12:16:03 +00:00
eef7ea98c3 Update doctest workflow file (#27306)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-06 11:27:48 +01:00
d788d37d24 Fix daily CI image build (#27307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-06 11:27:22 +01:00
b026b5ca6d Fix tokenizer export for LLamaTokenizerFast (#27222)
* fix tokenizer

* fix tokenizer
2023-11-06 10:26:18 +01:00
cc3e478185 translate run_scripts.md to chinese (#27246)
* translate run_scripts.md to chinese

* translate run_scripts.md to chinese

* translate run_scripts.md to chinese
2023-11-03 10:19:41 -07:00
bf7cfac20a translate autoclass_tutorial to chinese (#27269)
* translate autoclass_tutorial.md  to chinese

* translate update
2023-11-03 09:16:55 -07:00
1ac2463dfe [FA2] Add flash attention for for DistilBert (#26489)
* flash attention added for DistilBert

* fixes

* removed padding_masks

* Update modeling_distilbert.py

* Update test_modeling_distilbert.py

* style fix
2023-11-03 16:07:54 +00:00
5964f820db [Docs] Model_doc structure/clarity improvements (#26876)
* first batch of structure improvements for model_docs

* second batch of structure improvements for model_docs

* more structure improvements for model_docs

* more structure improvements for model_docs

* structure improvements for cv model_docs

* more structural refactoring

* addressed feedback about image processors
2023-11-03 10:57:03 -04:00
ad8ff96224 [Docs / SAM ] Reflect correct changes to run inference without OOM (#27268)
Update sam.md
2023-11-03 15:23:13 +01:00
f13f544ad9 Fix switch transformer mixed precision issue (#27220)
* Fix mixed precision error for switch transformer

* Fixup
2023-11-03 14:00:33 +00:00
db69bd88fb Update the ConversationalPipeline docstring for chat templates (#27250)
* Update the ConversationalPipeline docstring now that we're using chat templates

* Direct access to conversation.messages

* Explain the string init
2023-11-03 13:17:46 +00:00
011b15c1c7 [docs] Custom model doc update (#27213)
doc update
2023-11-03 08:03:13 -04:00
af8d1dc309 Avoid many failing tests in doctesting (#27262)
* fix

* update

* update

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-03 12:47:07 +01:00
8f1a43cd91 [PEFT / Tests ] Fix peft integration failing tests (#27258)
fix peft integration issues
2023-11-03 12:23:02 +01:00
05ea7b79e6 Refactor: Use Llama RoPE implementation for Falcon (#26933)
* Use Llama RoPE implementation for Falcon

+ Add copy functionalities

* Use standard cache format for Falcon

* Simplify apply_rotary_pos_emb, copy from Llama

* Remove unnecessary cache conversion test

We don't need to convert any caches anymore!

* Resolve copy complaint
2023-11-03 11:05:55 +00:00
e9a6c72b5e Fuyu protection (#27248) 2023-11-03 08:45:05 +01:00
552ff24488 Fixed base model class name extraction from PeftModels (#27162)
* Fixed base model class name extraction from PeftModels

* Changes to first unwrap the model then extract the base model name

* Changed base_model to base_model.model to stay consistent with peft model abstractions
2023-11-02 20:08:03 +00:00
Chi
4991216841 Removed the redundant SiLUActivation class. (#27136)
* Removed the redundant SiLUActivation class and now use nn.functional.silu directly.

* I apologize for adding torch.functional.silu. I have replaced it with nn.SiLU.
2023-11-02 18:13:57 +00:00
00d8502b7a translate peft.md to chinese (#27215)
* tranlsate peft.md to chinese

* translate peft.md to chinese

* fix missing link
2023-11-02 10:42:29 -07:00
bc78fd1274 Dev version 2023-11-02 18:15:36 +01:00
0ed6729bb1 Enrich TTS pipeline parameters naming (#26473)
* enrich TTS pipeline docstring for clearer forward_params use

* change token leghts

* update Pipeline parameters

* correct docstring and make style

* fix tests

* make style

* change music prompt

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* raise errors if generate_kwargs with forward-only models

* make style

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-02 17:06:56 +00:00
147e8ce4ae Remove redundant code from T5 encoder mask creation (#27216)
* remove redundant code

* update

* add typecasting

* make `attention_mask` float again
2023-11-02 16:01:41 +00:00
a6c82d4567 Generate: return past_key_values (#25086) 2023-11-02 15:39:21 +00:00
441c3e0dd2 fix-deprecated-exllama-arg (#27243)
fix-exllama
2023-11-02 11:23:31 -04:00
8801861d2d Fixing m4t. (#27240)
* Fixing m4t.

* Trying to remove comparison ? Odd test failure.

* Adding shared. But why on earth does it hang ????

* Putting back the model weights checks the test is silently failing on
cuda.

* Fix style + unremoved comment.
2023-11-02 15:32:17 +01:00
589 changed files with 19210 additions and 3184 deletions

View File

@ -283,7 +283,7 @@ torch_and_tf_job = CircleCIJob(
"pip install --upgrade --upgrade-strategy eager pip",
"pip install -U --upgrade-strategy eager .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]",
"pip install -U --upgrade-strategy eager tensorflow_probability",
"pip install -U --upgrade-strategy eager git+https://github.com/huggingface/accelerate",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
marker="is_pt_tf_cross_test",
pytest_options={"rA": None, "durations": 0},
@ -297,7 +297,7 @@ torch_and_flax_job = CircleCIJob(
"sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng",
"pip install -U --upgrade-strategy eager --upgrade pip",
"pip install -U --upgrade-strategy eager .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]",
"pip install -U --upgrade-strategy eager git+https://github.com/huggingface/accelerate",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
marker="is_pt_flax_cross_test",
pytest_options={"rA": None, "durations": 0},
@ -310,7 +310,7 @@ torch_job = CircleCIJob(
"sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng time",
"pip install --upgrade --upgrade-strategy eager pip",
"pip install -U --upgrade-strategy eager .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]",
"pip install -U --upgrade-strategy eager git+https://github.com/huggingface/accelerate",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
parallelism=1,
pytest_num_workers=6,
@ -397,13 +397,16 @@ custom_tokenizers_job = CircleCIJob(
examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
cache_name="torch_examples",
install_steps=[
"sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng",
"pip install --upgrade --upgrade-strategy eager pip",
"pip install -U --upgrade-strategy eager .[sklearn,torch,sentencepiece,testing,torch-speech]",
"pip install -U --upgrade-strategy eager -r examples/pytorch/_tests_requirements.txt",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
],
pytest_num_workers=1,
)
@ -510,7 +513,7 @@ doc_test_job = CircleCIJob(
"sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng time ffmpeg",
"pip install --upgrade --upgrade-strategy eager pip",
"pip install -U --upgrade-strategy eager -e .[dev]",
"pip install -U --upgrade-strategy eager git+https://github.com/huggingface/accelerate",
"pip install -U --upgrade-strategy eager -e git+https://github.com/huggingface/accelerate@main#egg=accelerate",
"pip install --upgrade --upgrade-strategy eager pytest pytest-sugar",
"pip install -U --upgrade-strategy eager natten",
"find -name __pycache__ -delete",

View File

@ -26,6 +26,8 @@ requirements:
- protobuf
- tokenizers >=0.11.1,!=0.11.3,<0.13
- pyyaml >=5.1
- safetensors
- fsspec
run:
- python
- numpy >=1.17
@ -40,6 +42,8 @@ requirements:
- protobuf
- tokenizers >=0.11.1,!=0.11.3,<0.13
- pyyaml >=5.1
- safetensors
- fsspec
test:
imports:

View File

@ -14,7 +14,7 @@ on:
jobs:
run_tests_templates_like:
name: "Add new model like template tests"
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3

View File

@ -3,7 +3,7 @@ name: Build docker images (scheduled)
on:
push:
branches:
- build_ci_docker_image*
- build_docker_on_kube
repository_dispatch:
workflow_call:
inputs:
@ -18,199 +18,9 @@ concurrency:
cancel-in-progress: false
jobs:
latest-docker:
name: "Latest PyTorch + TensorFlow [dev]"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu-push-ci
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }}
# Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
latest-torch-deepspeed-docker-for-push-ci-daily-build:
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
doc-builder:
name: "Doc builder"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-doc-builder
push: true
tags: huggingface/transformers-doc-builder
latest-pytorch:
name: "Latest PyTorch [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on: ubuntu-latest
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-gpu
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on: [self-hosted, docker-gpu, amd-gpu, single-gpu, mi210]
runs-on: [self-hosted, yih-dar-shieh-debug-doctest]
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@ -229,44 +39,3 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
latest-tensorflow:
name: "Latest TensorFlow [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v3
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-tensorflow-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-tensorflow-gpu

View File

@ -13,7 +13,7 @@ concurrency:
jobs:
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Cleanup disk
run: |
@ -50,7 +50,7 @@ jobs:
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Cleanup disk
run: |

View File

@ -16,7 +16,7 @@ jobs:
fail-fast: false
matrix:
version: ["1.13", "1.12", "1.11", "1.10"]
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
-
name: Set up Docker Buildx
@ -60,7 +60,7 @@ jobs:
fail-fast: false
matrix:
version: ["2.11", "2.10", "2.9", "2.8", "2.7", "2.6", "2.5"]
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
-
name: Set up Docker Buildx

View File

@ -15,7 +15,7 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: de en es fr hi it ko pt zh ja te
languages: de en es fr hi it ko pt tr zh ja te
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -14,4 +14,4 @@ jobs:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: transformers
languages: de en es fr hi it ko pt zh ja te
languages: de en es fr hi it ko pt tr zh ja te

View File

@ -18,7 +18,7 @@ env:
jobs:
check_runner_status:
name: Check Runner Status
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
outputs:
offline_runners: ${{ steps.set-offline_runners.outputs.offline_runners }}
steps:
@ -39,7 +39,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
needs: check_runner_status
if: ${{ failure() }}
steps:

View File

@ -14,7 +14,7 @@ env:
jobs:
check_tiny_models:
name: Check tiny models
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Checkout transformers
uses: actions/checkout@v3

View File

@ -20,7 +20,7 @@ env:
jobs:
run_doctests:
runs-on: [single-gpu, nvidia-gpu, t4, doctest-ci]
runs-on: [single-gpu, nvidia-gpu, t4, ci]
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -66,7 +66,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [run_doctests]
steps:

View File

@ -7,7 +7,7 @@ on:
jobs:
run_tests_templates:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@v3

View File

@ -12,7 +12,7 @@ env:
jobs:
build_and_package:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
defaults:
run:
shell: bash -l {0}

View File

@ -19,6 +19,7 @@ env:
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
@ -245,7 +246,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
setup,

View File

@ -30,6 +30,7 @@ env:
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
@ -288,7 +289,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
setup,

View File

@ -19,7 +19,7 @@ env:
jobs:
check_runner_status:
name: Check Runner Status
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
steps:
- name: Checkout transformers
uses: actions/checkout@v3
@ -241,7 +241,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
check_runner_status,

View File

@ -14,7 +14,7 @@ on:
jobs:
check-for-setup:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
name: Check if setup was changed
outputs:
changed: ${{ steps.was_changed.outputs.changed }}
@ -46,7 +46,7 @@ jobs:
run_push_ci:
name: Trigger Push CI
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: ${{ always() }}
needs: build-docker-containers
steps:

View File

@ -25,6 +25,7 @@ env:
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
@ -490,7 +491,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
setup,

View File

@ -23,6 +23,7 @@ env:
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
@ -399,7 +400,7 @@ jobs:
run_extract_warnings:
name: Extract warnings in CI artifacts
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
setup,
@ -447,7 +448,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
if: always()
needs: [
setup,

View File

@ -8,7 +8,7 @@ jobs:
close_stale_issues:
name: Close Stale Issues
if: github.repository == 'huggingface/transformers'
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:

View File

@ -8,7 +8,7 @@ on:
jobs:
build_and_package:
runs-on: ubuntu-latest
runs-on: ubuntu-22.04
defaults:
run:
shell: bash -l {0}

View File

@ -321,6 +321,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker.
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
@ -386,7 +387,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
@ -437,11 +438,12 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
@ -461,7 +463,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng), released on [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

View File

@ -296,6 +296,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker.
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
@ -361,7 +362,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
@ -412,11 +413,12 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released with the paper [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
@ -436,7 +438,7 @@ Número actual de puntos de control: ![](https://img.shields.io/endpoint?url=htt
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng) released with the paper [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

View File

@ -270,6 +270,7 @@ conda install -c huggingface transformers
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI से) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. द्वाराअनुसंधान पत्र [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) के साथ जारी किया गया
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI से) साथ वाला पेपर [लर्निंग ट्रांसफरेबल विजुअल मॉडल फ्रॉम नेचुरल लैंग्वेज सुपरविजन](https://arxiv.org /abs/2103.00020) एलेक रैडफोर्ड, जोंग वूक किम, क्रिस हैलासी, आदित्य रमेश, गेब्रियल गोह, संध्या अग्रवाल, गिरीश शास्त्री, अमांडा एस्केल, पामेला मिश्किन, जैक क्लार्क, ग्रेचेन क्रुएगर, इल्या सुत्स्केवर द्वारा।
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker.
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (सेल्सफोर्स से) साथ में पेपर [प्रोग्राम सिंथेसिस के लिए एक संवादात्मक प्रतिमान](https://arxiv.org/abs/2203.13474) एरिक निजकैंप, बो पैंग, हिरोआकी हयाशी, लिफू तू, हुआन वांग, यिंगबो झोउ, सिल्वियो सावरेस, कैमिंग जिओंग रिलीज।
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI से) Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. द्वाराअनुसंधान पत्र [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) के साथ जारी किया गया
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (माइक्रोसॉफ्ट रिसर्च एशिया से) कागज के साथ [फास्ट ट्रेनिंग कन्वर्जेंस के लिए सशर्त डीईटीआर](https://arxiv. org/abs/2108.06152) डेपू मेंग, ज़ियाओकांग चेन, ज़ेजिया फैन, गैंग ज़ेंग, होउकियांग ली, युहुई युआन, लेई सन, जिंगडोंग वांग द्वारा।
@ -335,7 +336,7 @@ conda install -c huggingface transformers
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce से) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. द्वाराअनुसंधान पत्र [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) के साथ जारी किया गया
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (माइक्रोसॉफ्ट रिसर्च एशिया से) साथ देने वाला पेपर [लेआउटएलएमवी3: यूनिफाइड टेक्स्ट और इमेज मास्किंग के साथ दस्तावेज़ एआई के लिए पूर्व-प्रशिक्षण](https://arxiv.org/abs/2204.08387) युपन हुआंग, टेंगचाओ लव, लेई कुई, युटोंग लू, फुरु वेई द्वारा पोस्ट किया गया।
@ -386,11 +387,12 @@ conda install -c huggingface transformers
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI से) साथ में कागज [विज़न ट्रांसफॉर्मर्स के साथ सिंपल ओपन-वोकैबुलरी ऑब्जेक्ट डिटेक्शन](https:/ /arxiv.org/abs/2205.06230) मैथियास मिंडरर, एलेक्सी ग्रिट्सेंको, ऑस्टिन स्टोन, मैक्सिम न्यूमैन, डिर्क वीसेनबोर्न, एलेक्सी डोसोवित्स्की, अरविंद महेंद्रन, अनुराग अर्नब, मुस्तफा देहघानी, ज़ुओरन शेन, जिओ वांग, ज़ियाओहुआ झाई, थॉमस किफ़, और नील हॉल्सबी द्वारा पोस्ट किया गया।
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (Google AI से) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. द्वाराअनुसंधान पत्र [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) के साथ जारी किया गया
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI से) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. द्वाराअनुसंधान पत्र [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) के साथ जारी किया गया
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google की ओर से) साथ में दिया गया पेपर [लंबे इनपुट सारांश के लिए ट्रांसफ़ॉर्मरों को बेहतर तरीके से एक्सटेंड करना](https://arxiv .org/abs/2208.04347) जेसन फांग, याओ झाओ, पीटर जे लियू द्वारा।
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (दीपमाइंड से) साथ में पेपर [पर्सीवर आईओ: संरचित इनपुट और आउटपुट के लिए एक सामान्य वास्तुकला] (https://arxiv.org/abs/2107.14795) एंड्रयू जेगल, सेबेस्टियन बोरग्यूड, जीन-बैप्टिस्ट अलायराक, कार्ल डोर्श, कैटलिन इओनेस्कु, डेविड द्वारा डिंग, स्कंद कोप्पुला, डैनियल ज़ोरान, एंड्रयू ब्रॉक, इवान शेलहैमर, ओलिवियर हेनाफ, मैथ्यू एम। बोट्विनिक, एंड्रयू ज़िसरमैन, ओरिओल विनियल्स, जोआओ कैरेरा द्वारा पोस्ट किया गया।
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT से) Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. द्वाराअनुसंधान पत्र [blog post](https://www.adept.ai/blog/persimmon-8b) के साथ जारी किया गया
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research से) कागज के साथ [PhoBERT: वियतनामी के लिए पूर्व-प्रशिक्षित भाषा मॉडल](https://www .aclweb.org/anthology/2020.findings-emnlp.92/) डैट क्वोक गुयेन और अन्ह तुआन गुयेन द्वारा पोस्ट किया गया।
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google से) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. द्वाराअनुसंधान पत्र [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) के साथ जारी किया गया
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP से) साथ वाला पेपर [प्रोग्राम अंडरस्टैंडिंग एंड जेनरेशन के लिए यूनिफाइड प्री-ट्रेनिंग](https://arxiv .org/abs/2103.06333) वसी उद्दीन अहमद, सैकत चक्रवर्ती, बैशाखी रे, काई-वेई चांग द्वारा।
@ -410,7 +412,7 @@ conda install -c huggingface transformers
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (झुईई टेक्नोलॉजी से), साथ में पेपर [रोफॉर्मर: रोटरी पोजिशन एंबेडिंग के साथ एन्हांस्ड ट्रांसफॉर्मर] (https://arxiv.org/pdf/2104.09864v1.pdf) जियानलिन सु और यू लू और शेंगफेंग पैन और बो वेन और युनफेंग लियू द्वारा प्रकाशित।
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng से) Bo Peng. द्वाराअनुसंधान पत्र [this repo](https://github.com/BlinkDL/RWKV-LM) के साथ जारी किया गया
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI से) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. द्वाराअनुसंधान पत्र [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) के साथ जारी किया गया
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP से) साथ देने वाला पेपर [भाषण पहचान के लिए अनसुपरवाइज्ड प्री-ट्रेनिंग में परफॉर्मेंस-एफिशिएंसी ट्रेड-ऑफ्स](https ://arxiv.org/abs/2109.06870) फेलिक्स वू, क्वांगयुन किम, जिंग पैन, क्यू हान, किलियन क्यू. वेनबर्गर, योव आर्टज़ी द्वारा।

View File

@ -330,6 +330,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI から) Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov. から公開された研究論文 [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687)
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI から) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever から公開された研究論文: [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020)
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (University of Göttingen から) Timo Lüddecke and Alexander Ecker から公開された研究論文: [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003)
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (Salesforce から) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong から公開された研究論文: [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474)
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI から) Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve. から公開された研究論文 [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/)
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (Microsoft Research Asia から) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang から公開された研究論文: [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152)
@ -395,7 +396,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce から) Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi. から公開された研究論文 [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500)
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (OpenAI から) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever から公開された研究論文: [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf)
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (Microsoft Research Asia から) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou から公開された研究論文: [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318)
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (Microsoft Research Asia から) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou から公開された研究論文: [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740)
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (Microsoft Research Asia から) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei から公開された研究論文: [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387)
@ -446,11 +447,12 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (Meta AI から) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al から公開された研究論文: [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068)
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI から) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby から公開された研究論文: [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230)
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (Google AI から) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. から公開された研究論文 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683)
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI から) Matthias Minderer, Alexey Gritsenko, Neil Houlsby. から公開された研究論文 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683)
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (Google から) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu から公開された研究論文: [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google から) Jason Phang, Yao Zhao, and Peter J. Liu から公開された研究論文: [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347)
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (Deepmind から) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira から公開された研究論文: [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795)
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT から) Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani. から公開された研究論文 [blog post](https://www.adept.ai/blog/persimmon-8b)
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research から) Dat Quoc Nguyen and Anh Tuan Nguyen から公開された研究論文: [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/)
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google から) Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. から公開された研究論文 [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347)
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP から) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang から公開された研究論文: [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333)
@ -470,7 +472,7 @@ Flax、PyTorch、TensorFlowをcondaでインストールする方法は、それ
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (WeChatAI から) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou から公開された研究論文: [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf)
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ZhuiyiTechnology から), Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu から公開された研究論文: [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864)
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng から) Bo Peng. から公開された研究論文 [this repo](https://github.com/BlinkDL/RWKV-LM)
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (NVIDIA から) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo から公開された研究論文: [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203)
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI から) Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick. から公開された研究論文 [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf)
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP から) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi から公開された研究論文: [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870)

View File

@ -245,6 +245,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (LAION-AI 에서 제공)은 Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.의 [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687)논문과 함께 발표했습니다.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (OpenAI 에서) Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 의 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 논문과 함께 발표했습니다.
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (University of Göttingen 에서) Timo Lüddecke and Alexander Ecker 의 [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) 논문과 함께 발표했습니다.
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (Salesforce 에서) Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 의 [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) 논문과 함께 발표했습니다.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (MetaAI 에서 제공)은 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.의 [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/)논문과 함께 발표했습니다.
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (Microsoft Research Asia 에서) Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang 의 [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) 논문과 함께 발표했습니다.
@ -310,7 +311,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (Salesforce 에서 제공)은 Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.의 [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500)논문과 함께 발표했습니다.
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (OpenAI 에서) Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever 의 [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) 논문과 함께 발표했습니다.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (Microsoft Research Asia 에서) Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 의 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 논문과 함께 발표했습니다.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (Microsoft Research Asia 에서) Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 의 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 논문과 함께 발표했습니다.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (Microsoft Research Asia 에서) Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 의 [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) 논문과 함께 발표했습니다.
@ -361,11 +362,12 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (Meta AI 에서) Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 의 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 논문과 함께 발표했습니다.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (Google AI 에서) Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 의 [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) 논문과 함께 발표했습니다.
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (Google AI 에서 제공)은 Matthias Minderer, Alexey Gritsenko, Neil Houlsby.의 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683)논문과 함께 발표했습니다.
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (Google AI 에서 제공)은 Matthias Minderer, Alexey Gritsenko, Neil Houlsby.의 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683)논문과 함께 발표했습니다.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (Google 에서) Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 의 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 논문과 함께 발표했습니다.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (Google 에서) Jason Phang, Yao Zhao, Peter J. Liu 의 [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) 논문과 함께 발표했습니다.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (Deepmind 에서) Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 의 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 논문과 함께 발표했습니다.
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (ADEPT 에서 제공)은 Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.의 [blog post](https://www.adept.ai/blog/persimmon-8b)논문과 함께 발표했습니다.
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (VinAI Research 에서) Dat Quoc Nguyen and Anh Tuan Nguyen 의 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 논문과 함께 발표했습니다.
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (Google 에서 제공)은 Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.의 [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347)논문과 함께 발표했습니다.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (UCLA NLP 에서) Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 의 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 논문과 함께 발표했습니다.
@ -385,7 +387,7 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (WeChatAI 에서) HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou 의 [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) 논문과 함께 발표했습니다.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (ZhuiyiTechnology 에서) Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 의 a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 논문과 함께 발표했습니다.
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (Bo Peng 에서 제공)은 Bo Peng.의 [this repo](https://github.com/BlinkDL/RWKV-LM)논문과 함께 발표했습니다.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (NVIDIA 에서) Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 의 [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) 논문과 함께 발표했습니다.
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (Meta AI 에서 제공)은 Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.의 [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf)논문과 함께 발표했습니다.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (ASAPP 에서) Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 의 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 논문과 함께 발표했습니다.

View File

@ -436,6 +436,7 @@ conda install -c huggingface transformers
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[Persimmon](https://huggingface.co/docs/transformers/main/model_doc/persimmon)** (from ADEPT) released in a [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.
1. **[Phi](https://huggingface.co/docs/main/transformers/model_doc/phi)** (from Microsoft Research) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

View File

@ -269,6 +269,7 @@ conda install -c huggingface transformers
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (来自 LAION-AI) 伴随论文 [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) 由 Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov 发布。
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (来自 University of Göttingen) 伴随论文 [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) 由 Timo Lüddecke and Alexander Ecker 发布。
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (来自 Salesforce) 伴随论文 [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) 由 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 发布。
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (来自 MetaAI) 伴随论文 [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) 由 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve 发布。
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (来自 Microsoft Research Asia) 伴随论文 [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) 由 Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang 发布。
@ -334,7 +335,7 @@ conda install -c huggingface transformers
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (来自 Salesforce) 伴随论文 [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) 由 Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi 发布。
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) 由 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 发布。
@ -385,11 +386,12 @@ conda install -c huggingface transformers
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (来自 [s-JoL](https://huggingface.co/s-JoL)) 由 GitHub (现已删除).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (来自 Google AI) 伴随论文 [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) 由 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 发布。
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (来自 Google AI) 伴随论文 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) 由 Matthias Minderer, Alexey Gritsenko, Neil Houlsby 发布。
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (来自 Google AI) 伴随论文 [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) 由 Matthias Minderer, Alexey Gritsenko, Neil Houlsby 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (来自 Google) 伴随论文 [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) 由 Jason Phang, Yao Zhao, Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (来自 ADEPT) 伴随论文 [blog post](https://www.adept.ai/blog/persimmon-8b) 由 Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani 发布。
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (来自 Google) 伴随论文 [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) 由 Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova 发布。
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (来自 UCLA NLP) 伴随论文 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
@ -409,7 +411,7 @@ conda install -c huggingface transformers
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (来自 WeChatAI), 伴随论文 [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) 由 HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou 发布。
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (来自 ZhuiyiTechnology), 伴随论文 [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (来自 Bo Peng) 伴随论文 [this repo](https://github.com/BlinkDL/RWKV-LM) 由 Bo Peng 发布。
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (来自 NVIDIA) 伴随论文 [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 发布。
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (来自 Meta AI) 伴随论文 [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) 由 Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick 发布。
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (来自 ASAPP) 伴随论文 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。

View File

@ -281,6 +281,7 @@ conda install -c huggingface transformers
1. **[CLAP](https://huggingface.co/docs/transformers/model_doc/clap)** (from LAION-AI) released with the paper [Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation](https://arxiv.org/abs/2211.06687) by Yusong Wu, Ke Chen, Tianyu Zhang, Yuchen Hui, Taylor Berg-Kirkpatrick, Shlomo Dubnov.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CLIPSeg](https://huggingface.co/docs/transformers/model_doc/clipseg)** (from University of Göttingen) released with the paper [Image Segmentation Using Text and Image Prompts](https://arxiv.org/abs/2112.10003) by Timo Lüddecke and Alexander Ecker.
1. **[CLVP](https://huggingface.co/docs/transformers/main/model_doc/clvp)** released with the paper [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[CodeLlama](https://huggingface.co/docs/transformers/model_doc/llama_code)** (from MetaAI) released with the paper [Code Llama: Open Foundation Models for Code](https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/) by Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve.
1. **[Conditional DETR](https://huggingface.co/docs/transformers/model_doc/conditional_detr)** (from Microsoft Research Asia) released with the paper [Conditional DETR for Fast Training Convergence](https://arxiv.org/abs/2108.06152) by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.
@ -346,7 +347,7 @@ conda install -c huggingface transformers
1. **[Informer](https://huggingface.co/docs/transformers/model_doc/informer)** (from Beihang University, UC Berkeley, Rutgers University, SEDD Company) released with the paper [Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting](https://arxiv.org/abs/2012.07436) by Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wancai Zhang.
1. **[InstructBLIP](https://huggingface.co/docs/transformers/model_doc/instructblip)** (from Salesforce) released with the paper [InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning](https://arxiv.org/abs/2305.06500) by Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven Hoi.
1. **[Jukebox](https://huggingface.co/docs/transformers/model_doc/jukebox)** (from OpenAI) released with the paper [Jukebox: A Generative Model for Music](https://arxiv.org/pdf/2005.00341.pdf) by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/main/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[KOSMOS-2](https://huggingface.co/docs/transformers/model_doc/kosmos-2)** (from Microsoft Research Asia) released with the paper [Kosmos-2: Grounding Multimodal Large Language Models to the World](https://arxiv.org/abs/2306.14824) by Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
@ -397,11 +398,12 @@ conda install -c huggingface transformers
1. **[OpenLlama](https://huggingface.co/docs/transformers/model_doc/open-llama)** (from [s-JoL](https://huggingface.co/s-JoL)) released on GitHub (now removed).
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/main/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[OWLv2](https://huggingface.co/docs/transformers/model_doc/owlv2)** (from Google AI) released with the paper [Scaling Open-Vocabulary Object Detection](https://arxiv.org/abs/2306.09683) by Matthias Minderer, Alexey Gritsenko, Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[Persimmon](https://huggingface.co/docs/transformers/model_doc/persimmon)** (from ADEPT) released with the paper [blog post](https://www.adept.ai/blog/persimmon-8b) by Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşırlar, Tri Dao, Curtis Hawthorne, Deepak Moparthi, Arushi Somani.
1. **[Phi](https://huggingface.co/docs/transformers/main/model_doc/phi)** (from Microsoft) released with the papers - [Textbooks Are All You Need](https://arxiv.org/abs/2306.11644) by Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee and Yuanzhi Li, [Textbooks Are All You Need II: phi-1.5 technical report](https://arxiv.org/abs/2309.05463) by Yuanzhi Li, Sébastien Bubeck, Ronen Eldan, Allie Del Giorno, Suriya Gunasekar and Yin Tat Lee.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[Pix2Struct](https://huggingface.co/docs/transformers/model_doc/pix2struct)** (from Google) released with the paper [Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding](https://arxiv.org/abs/2210.03347) by Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, Fangyu Liu, Julian Eisenschlos, Urvashi Khandelwal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
@ -421,7 +423,7 @@ conda install -c huggingface transformers
1. **[RoCBert](https://huggingface.co/docs/transformers/model_doc/roc_bert)** (from WeChatAI) released with the paper [RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining](https://aclanthology.org/2022.acl-long.65.pdf) by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[RWKV](https://huggingface.co/docs/transformers/model_doc/rwkv)** (from Bo Peng) released with the paper [this repo](https://github.com/BlinkDL/RWKV-LM) by Bo Peng.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/main/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SeamlessM4T](https://huggingface.co/docs/transformers/model_doc/seamless_m4t)** (from Meta AI) released with the paper [SeamlessM4T — Massively Multilingual & Multimodal Machine Translation](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf) by the Seamless Communication team.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[Segment Anything](https://huggingface.co/docs/transformers/model_doc/sam)** (from Meta AI) released with the paper [Segment Anything](https://arxiv.org/pdf/2304.02643v1.pdf) by Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alex Berg, Wan-Yen Lo, Piotr Dollar, Ross Girshick.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.

View File

@ -11,7 +11,7 @@ SHELL ["sh", "-lc"]
ARG PYTORCH='2.1.0'
# (not always a valid torch version)
ARG INTEL_TORCH_EXT='1.11.0'
ARG INTEL_TORCH_EXT='2.1.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu118'
@ -37,7 +37,7 @@ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir intel_extension_for_pytorch==$INTEL_TORCH_EXT+cpu -f https://developer.intel.com/ipex-whl-stable-cpu
RUN python3 -m pip install --no-cache-dir intel_extension_for_pytorch==$INTEL_TORCH_EXT -f https://developer.intel.com/ipex-whl-stable-cpu
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"
@ -56,7 +56,7 @@ RUN python3 -m pip install --no-cache-dir auto-gptq --extra-index-url https://hu
RUN python3 -m pip install --no-cache-dir einops
# Add autoawq for quantization testing
RUN python3 -m pip install --no-cache-dir autoawq
RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.1.6/autoawq-0.1.6+cu118-cp38-cp38-linux_x86_64.whl
# For bettertransformer + gptq
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
@ -67,6 +67,9 @@ RUN python3 -m pip install --no-cache-dir decord av==9.2.0
# For `dinat` model
RUN python3 -m pip install --no-cache-dir natten -f https://shi-labs.com/natten/wheels/$CUDA/
# For `nougat` tokenizer
RUN python3 -m pip install --no-cache-dir python-Levenshtein
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -28,4 +28,4 @@ RUN python3 -m pip uninstall -y tensorflow flax
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
RUN cd transformers && python3 setup.py develop

View File

@ -424,6 +424,8 @@
title: PEGASUS-X
- local: model_doc/persimmon
title: Persimmon
- local: model_doc/phi
title: Phi
- local: model_doc/phobert
title: PhoBERT
- local: model_doc/plbart
@ -663,6 +665,8 @@
title: CLIP
- local: model_doc/clipseg
title: CLIPSeg
- local: model_doc/clvp
title: CLVP
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deplot

View File

@ -20,25 +20,11 @@ rendered properly in your Markdown viewer.
An increasingly common use case for LLMs is **chat**. In a chat context, rather than continuing a single string
of text (as is the case with a standard language model), the model instead continues a conversation that consists
of one or more **messages**, each of which includes a **role** as well as message text.
of one or more **messages**, each of which includes a **role**, like "user" or "assistant", as well as message text.
Most commonly, these roles are "user" for messages sent by the user, and "assistant" for messages sent by the model.
Some models also support a "system" role. System messages are usually sent at the beginning of the conversation
and include directives about how the model should behave in the subsequent chat.
All language models, including models fine-tuned for chat, operate on linear sequences of tokens and do not intrinsically
have special handling for roles. This means that role information is usually injected by adding control tokens
between messages, to indicate both the message boundary and the relevant roles.
Unfortunately, there isn't (yet!) a standard for which tokens to use, and so different models have been trained
with wildly different formatting and control tokens for chat. This can be a real problem for users - if you use the
wrong format, then the model will be confused by your input, and your performance will be a lot worse than it should be.
This is the problem that **chat templates** aim to resolve.
Chat conversations are typically represented as a list of dictionaries, where each dictionary contains `role`
and `content` keys, and represents a single chat message. Chat templates are strings containing a Jinja template that
specifies how to format a conversation for a given model into a single tokenizable sequence. By storing this information
with the tokenizer, we can ensure that models get input data in the format they expect.
Much like tokenization, different models expect very different input formats for chat. This is the reason we added
**chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
represented as lists of messages, into a single tokenizable string in the format that the model expects.
Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default
template, which mostly just adds whitespace between rounds of dialogue:
@ -48,9 +34,9 @@ template, which mostly just adds whitespace between rounds of dialogue:
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
>>> chat = [
... {"role": "user", "content": "Hello, how are you?"},
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
... {"role": "user", "content": "Hello, how are you?"},
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
... ]
>>> tokenizer.apply_chat_template(chat, tokenize=False)
@ -59,28 +45,196 @@ template, which mostly just adds whitespace between rounds of dialogue:
Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting,
that string will also be tokenized for us. To see a more complex template in action, though, let's use the
`meta-llama/Llama-2-7b-chat-hf` model. Note that this model has gated access, so you will have to
[request access on the repo](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) if you want to run this code yourself:
`mistralai/Mistral-7B-Instruct-v0.1` model.
```python
>> from transformers import AutoTokenizer
>> tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
>> chat = [
>>> chat = [
... {"role": "user", "content": "Hello, how are you?"},
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
... ]
>> tokenizer.use_default_system_prompt = False
>> tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST]"
>>> tokenizer.apply_chat_template(chat, tokenize=False)
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
```
Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
user messages (but not assistant messages!)
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
## How do chat templates work?
## How do I use chat templates?
As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with `role`
and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_template`] method. Once you do that,
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts).
Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))
```
This will yield a string in the input format that Zephyr expects.
```text
<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
```
Now that our input is formatted correctly for Zephyr, we can use the model to generate a response to the user's question:
```python
outputs = model.generate(tokenized_chat, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))
```
This will yield:
```text
<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
<|user|>
How many helicopters can a human eat in one sitting?</s>
<|assistant|>
Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
```
Arr, 'twas easy after all!
## Is there an automated pipeline for chat?
Yes, there is: [`ConversationalPipeline`]. This pipeline is designed to make it easy to use chat models. Let's try
the `Zephyr` example again, but this time using the pipeline:
```python
from transformers import pipeline
pipe = pipeline("conversational", "HuggingFaceH4/zephyr-7b-beta")
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
{"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
print(pipe(messages))
```
```text
Conversation id: 76d886a0-74bd-454e-9804-0467041a63dc
system: You are a friendly chatbot who always responds in the style of a pirate
user: How many helicopters can a human eat in one sitting?
assistant: Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all.
```
[`ConversationalPipeline`] will take care of all the details of tokenization and calling `apply_chat_template` for you -
once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages!
## What are "generation prompts"?
You may have noticed that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells
the template to add tokens that indicate the start of a bot response. For example, consider the following chat:
```python
messages = [
{"role": "user", "content": "Hi there!"},
{"role": "assistant", "content": "Nice to meet you!"},
{"role": "user", "content": "Can I ask a question?"}
]
```
Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example:
```python
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
"""
```
And here's what it looks like **with** a generation prompt:
```python
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
```
Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model
generates text it will write a bot response instead of doing something unexpected, like continuing the user's
message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a
special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're
supposed to be doing.
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
effect that `add_generation_prompt` has will depend on the template being used.
## Can I use chat templates in training?
Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you
can simply continue like any other language model training task. When training, you should usually set
`add_generation_prompt=False`, because the added tokens to prompt an assistant response will not be helpful during
training. Let's see an example:
```python
from transformers import AutoTokenizer
from datasets import Dataset
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
chat1 = [
{"role": "user", "content": "Which is bigger, the moon or the sun?"},
{"role": "assistant", "content": "The sun."}
]
chat2 = [
{"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
{"role": "assistant", "content": "A bacterium."}
]
dataset = Dataset.from_dict({"chat": [chat1, chat2]})
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
print(dataset['formatted_chat'][0])
```
And we get:
```text
<|user|>
Which is bigger, the moon or the sun?</s>
<|assistant|>
The sun.</s>
```
From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.
## Advanced: How do chat templates work?
The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
default template for that model class is used instead. Let's take a look at the template for `BlenderBot`:
@ -154,7 +308,9 @@ Hopefully if you stare at this for a little bit you can see what this template i
on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly
distinguishable to the model because of the tokens they're wrapped in.
## How do I create a chat template?
## Advanced: Adding and editing chat templates
### How do I create a chat template?
Simple, just write a jinja template and set `tokenizer.chat_template`. You may find it easier to start with an
existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template
@ -187,7 +343,7 @@ tokenizer.push_to_hub("model_name") # Upload your new template to the Hub!
The method [`~PreTrainedTokenizer.apply_chat_template`] which uses your chat template is called by the [`ConversationalPipeline`] class, so
once you set the correct chat template, your model will automatically become compatible with [`ConversationalPipeline`].
## What are "default" templates?
### What are "default" templates?
Before the introduction of chat templates, chat handling was hardcoded at the model class level. For backwards
compatibility, we have retained this class-specific handling as default templates, also set at the class level. If a
@ -200,7 +356,7 @@ the class template is appropriate for your model, we strongly recommend overridi
setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured
for chat, and to future-proof in case the default templates are ever altered or deprecated.
## What template should I use?
### What template should I use?
When setting the template for a model that's already been trained for chat, you should ensure that the template
exactly matches the message formatting that the model saw during training, or else you will probably experience
@ -229,7 +385,7 @@ tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set a
This template wraps each message in `<|im_start|>` and `<|im_end|>` tokens, and simply writes the role as a string, which
allows for flexibility in the roles you train with. The output looks like this:
```
```text
<|im_start|>system
You are a helpful chatbot that will do its best not to say anything so stupid that people tweet about it.<|im_end|>
<|im_start|>user
@ -242,62 +398,12 @@ The "user", "system" and "assistant" roles are the standard for chat, and we rec
particularly if you want your model to operate well with [`ConversationalPipeline`]. However, you are not limited
to these roles - templating is extremely flexible, and any string can be a role.
## What are "generation prompts"?
You may notice that the `apply_chat_template` method has an `add_generation_prompt` argument. This argument tells
the template to add tokens that indicate the start of a bot response. For example, consider the following chat:
```python
messages = [
{"role": "user", "content": "Hi there!"},
{"role": "assistant", "content": "Nice to meet you!"},
{"role": "user", "content": "Can I ask a question?"}
]
```
Here's what this will look like without a generation prompt, using the ChatML template we described above:
```python
>> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
"""
```
And here's what it looks like **with** a generation prompt:
```python
>> tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
```
Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model
generates text it will write a bot response instead of doing something unexpected, like continuing the user's
message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a
special kind of text to them! You need to guide them with the appropriate control tokens so they know what they're
supposed to be doing.
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
effect that `add_generation_prompt` has will depend on the template being used.
## I want to use chat templates! How should I get started?
### I want to add some chat templates! How should I get started?
If you have any chat models, you should set their `tokenizer.chat_template` attribute and test it using
[`~PreTrainedTokenizer.apply_chat_template`]. This applies even if you're not the model owner - if you're using a model
with an empty chat template, or one that's still using the default class template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to
the model repository so that this attribute can be set properly!
[`~PreTrainedTokenizer.apply_chat_template`], then push the updated tokenizer to the Hub. This applies even if you're
not the model owner - if you're using a model with an empty chat template, or one that's still using the default class
template, please open a [pull request](https://huggingface.co/docs/hub/repositories-pull-requests-discussions) to the model repository so that this attribute can be set properly!
Once the attribute is set, that's it, you're done! `tokenizer.apply_chat_template` will now work correctly for that
model, which means it is also automatically supported in places like `ConversationalPipeline`!
@ -306,7 +412,7 @@ By ensuring that models have this attribute, we can make sure that the whole com
open-source models. Formatting mismatches have been haunting the field and silently harming performance for too long -
it's time to put an end to them!
## Template writing tips
## Advanced: Template writing tips
If you're unfamiliar with Jinja, we generally find that the easiest way to write a chat template is to first
write a short Python script that formats messages the way you want, and then convert that script into a template.

View File

@ -110,7 +110,7 @@ You can also save your configuration file as a dictionary or even just the diffe
## Model
The next step is to create a [model](main_classes/models). The model - also loosely referred to as the architecture - defines what each layer is doing and what operations are happening. Attributes like `num_hidden_layers` from the configuration are used to define the architecture. Every model shares the base class [`PreTrainedModel`] and a few common methods like resizing input embeddings and pruning self-attention heads. In addition, all models are also either a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) or [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/flax.linen.html#module) subclass. This means models are compatible with each of their respective framework's usage.
The next step is to create a [model](main_classes/models). The model - also loosely referred to as the architecture - defines what each layer is doing and what operations are happening. Attributes like `num_hidden_layers` from the configuration are used to define the architecture. Every model shares the base class [`PreTrainedModel`] and a few common methods like resizing input embeddings and pruning self-attention heads. In addition, all models are also either a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) or [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) subclass. This means models are compatible with each of their respective framework's usage.
<frameworkcontent>
<pt>

View File

@ -272,6 +272,22 @@ Note that there is no need to specify an auto class for the configuration (there
[`AutoConfig`]) but it's different for models. Your custom model could be suitable for many different tasks, so you
have to specify which one of the auto classes is the correct one for your model.
<Tip>
Use `register_for_auto_class()` if you want the code files to be copied. If you instead prefer to use code on the Hub from another repo,
you don't need to call it. In cases where there's more than one auto class, you can modify the `config.json` directly using the
following structure:
```
"auto_map": {
"AutoConfig": "<your-repo-name>--<config-name>",
"AutoModel": "<your-repo-name>--<config-name>",
"AutoModelFor<Task>": "<your-repo-name>--<config-name>",
},
```
</Tip>
Next, let's create the config and models as we did before:
```py

View File

@ -99,7 +99,7 @@ Define a `model_init` function and pass it to the [`Trainer`], as an example:
... config=config,
... cache_dir=model_args.cache_dir,
... revision=model_args.model_revision,
... use_auth_token=True if model_args.use_auth_token else None,
... token=True if model_args.use_auth_token else None,
... )
```

View File

@ -92,6 +92,7 @@ Flax), PyTorch, and/or TensorFlow.
| [CLAP](model_doc/clap) | ✅ | ❌ | ❌ |
| [CLIP](model_doc/clip) | ✅ | ✅ | ✅ |
| [CLIPSeg](model_doc/clipseg) | ✅ | ❌ | ❌ |
| [CLVP](model_doc/clvp) | ✅ | ❌ | ❌ |
| [CodeGen](model_doc/codegen) | ✅ | ❌ | ❌ |
| [CodeLlama](model_doc/code_llama) | ✅ | ❌ | ❌ |
| [Conditional DETR](model_doc/conditional_detr) | ✅ | ❌ | ❌ |
@ -216,6 +217,7 @@ Flax), PyTorch, and/or TensorFlow.
| [PEGASUS-X](model_doc/pegasus_x) | ✅ | ❌ | ❌ |
| [Perceiver](model_doc/perceiver) | ✅ | ❌ | ❌ |
| [Persimmon](model_doc/persimmon) | ✅ | ❌ | ❌ |
| [Phi](model_doc/phi) | ✅ | ❌ | ❌ |
| [PhoBERT](model_doc/phobert) | ✅ | ✅ | ✅ |
| [Pix2Struct](model_doc/pix2struct) | ✅ | ❌ | ❌ |
| [PLBart](model_doc/plbart) | ✅ | ❌ | ❌ |

View File

@ -44,6 +44,7 @@ By default, `TrainingArguments.report_to` is set to `"all"`, so a [`Trainer`] wi
- [`~integrations.ClearMLCallback`] if [clearml](https://github.com/allegroai/clearml) is installed.
- [`~integrations.DagsHubCallback`] if [dagshub](https://dagshub.com/) is installed.
- [`~integrations.FlyteCallback`] if [flyte](https://flyte.org/) is installed.
- [`~integrations.DVCLiveCallback`] if [dvclive](https://dvc.org/doc/dvclive) is installed.
If a package is installed but you don't wish to use the accompanying integration, you can change `TrainingArguments.report_to` to a list of just those integrations you want to use (e.g. `["azure_ml", "wandb"]`).
@ -88,6 +89,9 @@ Here is the list of the available [`TrainerCallback`] in the library:
[[autodoc]] integrations.FlyteCallback
[[autodoc]] integrations.DVCLiveCallback
- setup
## TrainerCallback
[[autodoc]] TrainerCallback

View File

@ -225,7 +225,7 @@ For users, a rule of thumb is:
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
only way to go.**
- If you are latency constrained (live product doing inference), don't batch
- If you are latency constrained (live product doing inference), don't batch.
- If you are using CPU, don't batch.
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:

View File

@ -86,7 +86,7 @@ This library hosts the processor to load the XNLI data:
Please note that since the gold labels are available on the test set, evaluation is performed on the test set.
An example using these processors is given in the [run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/legacy/text-classification/run_xnli.py) script.
An example using these processors is given in the [run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_xnli.py) script.
## SQuAD

View File

@ -95,7 +95,7 @@ The benchmark was run on a NVIDIA-A100 instance and the model used was [`TheBlok
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/quantization/forward_latency_plot.png">
</div>
You can find the full results together with packages versions in [this link](https://github.com/huggingface/optimum-benchmark/tree/main/examples/running-mistral).
You can find the full results together with packages versions in [this link](https://github.com/huggingface/optimum-benchmark/tree/main/examples/running-mistrals).
From the results it appears that AWQ quantization method is the fastest quantization method for inference, text generation and among the lowest peak memory for text generation. However, AWQ seems to have the largest forward latency per batch size.

View File

@ -55,6 +55,8 @@ to a given token).
[[autodoc]] PreTrainedTokenizer
- __call__
- add_tokens
- add_special_tokens
- apply_chat_template
- batch_decode
- decode
@ -69,6 +71,8 @@ loaded very simply into 🤗 transformers. Take a look at the [Using tokenizers
[[autodoc]] PreTrainedTokenizerFast
- __call__
- add_tokens
- add_special_tokens
- apply_chat_template
- batch_decode
- decode

View File

@ -45,7 +45,10 @@ self-supervised loss that focuses on modeling inter-sentence coherence, and show
with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and
SQuAD benchmarks while having fewer parameters compared to BERT-large.*
Tips:
This model was contributed by [lysandre](https://huggingface.co/lysandre). This model jax version was contributed by
[kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/google-research/ALBERT).
## Usage tips
- ALBERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
than the left.
@ -56,11 +59,7 @@ Tips:
- Layers are split in groups that share parameters (to save memory).
Next sentence prediction is replaced by a sentence ordering prediction: in the inputs, we have two sentences A and B (that are consecutive) and we either feed A followed by B or B followed by A. The model must predict if they have been swapped or not.
This model was contributed by [lysandre](https://huggingface.co/lysandre). This model jax version was contributed by
[kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/google-research/ALBERT).
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
@ -90,6 +89,9 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This
[[autodoc]] models.albert.modeling_tf_albert.TFAlbertForPreTrainingOutput
<frameworkcontent>
<pt>
## AlbertModel
[[autodoc]] AlbertModel
@ -124,6 +126,10 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This
[[autodoc]] AlbertForQuestionAnswering
- forward
</pt>
<tf>
## TFAlbertModel
[[autodoc]] TFAlbertModel
@ -159,6 +165,9 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This
[[autodoc]] TFAlbertForQuestionAnswering
- call
</tf>
<jax>
## FlaxAlbertModel
[[autodoc]] FlaxAlbertModel
@ -193,3 +202,8 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). This
[[autodoc]] FlaxAlbertForQuestionAnswering
- __call__
</jax>
</frameworkcontent>

View File

@ -24,7 +24,10 @@ The abstract from the paper is the following:
*Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual and vision-language representations still rely heavily on curated training datasets that are expensive or require expert knowledge. For vision applications, representations are mostly learned using datasets with explicit class labels such as ImageNet or OpenImages. For vision-language, popular datasets like Conceptual Captions, MSCOCO, or CLIP all involve a non-trivial data collection (and cleaning) process. This costly curation process limits the size of datasets and hence hinders the scaling of trained models. In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme. Our visual representation achieves strong performance when transferred to classification tasks such as ImageNet and VTAB. The aligned visual and language representations enables zero-shot image classification and also set new state-of-the-art results on Flickr30K and MSCOCO image-text retrieval benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable cross-modality search with complex text and text + image queries.*
## Usage
This model was contributed by [Alara Dirik](https://huggingface.co/adirik).
The original code is not released, this implementation is based on the Kakao Brain implementation based on the original paper.
## Usage example
ALIGN uses EfficientNet to get visual features and BERT to get the text features. Both the text and visual features are then projected to a latent space with identical dimension. The dot product between the projected image and text features is then used as a similarity score.
@ -56,9 +59,6 @@ probs = logits_per_image.softmax(dim=1)
print(probs)
```
This model was contributed by [Alara Dirik](https://huggingface.co/adirik).
The original code is not released, this implementation is based on the Kakao Brain implementation based on the original paper.
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with ALIGN.
@ -69,7 +69,6 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we will review it. The resource should ideally demonstrate something new instead of duplicating an existing resource.
## AlignConfig
[[autodoc]] AlignConfig

View File

@ -31,7 +31,9 @@ teacher learning and contrastive learning. We validate our method through evalua
performances on a bunch of tasks including ImageNet-CN, Flicker30k- CN, and COCO-CN. Further, we obtain very close performances with
CLIP on almost all tasks, suggesting that one can simply alter the text encoder in CLIP for extended capabilities such as multilingual understanding.*
## Usage
This model was contributed by [jongjyh](https://huggingface.co/jongjyh).
## Usage tips and example
The usage of AltCLIP is very similar to the CLIP. the difference between CLIP is the text encoder. Note that we use bidirectional attention instead of casual attention
and we take the [CLS] token in XLM-R to represent text embedding.
@ -50,7 +52,6 @@ The [`AltCLIPProcessor`] wraps a [`CLIPImageProcessor`] and a [`XLMRobertaTokeni
encode the text and prepare the images. The following example shows how to get the image-text similarity scores using
[`AltCLIPProcessor`] and [`AltCLIPModel`].
```python
>>> from PIL import Image
>>> import requests
@ -70,11 +71,11 @@ encode the text and prepare the images. The following example shows how to get t
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
```
Tips:
<Tip>
This model is build on `CLIPModel`, so use it like a original CLIP.
This model is based on `CLIPModel`, use it like you would use the original [CLIP](clip).
This model was contributed by [jongjyh](https://huggingface.co/jongjyh).
</Tip>
## AltCLIPConfig

View File

@ -26,7 +26,15 @@ The abstract from the paper is the following:
*In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels. To better capture long-range global context, a recent trend is to add a self-attention mechanism on top of the CNN, forming a CNN-attention hybrid model. However, it is unclear whether the reliance on a CNN is necessary, and if neural networks purely based on attention are sufficient to obtain good performance in audio classification. In this paper, we answer the question by introducing the Audio Spectrogram Transformer (AST), the first convolution-free, purely attention-based model for audio classification. We evaluate AST on various audio classification benchmarks, where it achieves new state-of-the-art results of 0.485 mAP on AudioSet, 95.6% accuracy on ESC-50, and 98.1% accuracy on Speech Commands V2.*
Tips:
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/audio_spectogram_transformer_architecture.png"
alt="drawing" width="600"/>
<small> Audio Spectrogram Transformer architecture. Taken from the <a href="https://arxiv.org/abs/2104.01778">original paper</a>.</small>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/YuanGongND/ast).
## Usage tips
- When fine-tuning the Audio Spectrogram Transformer (AST) on your own dataset, it's recommended to take care of the input normalization (to make
sure the input has mean of 0 and std of 0.5). [`ASTFeatureExtractor`] takes care of this. Note that it uses the AudioSet
@ -35,14 +43,6 @@ the authors compute the stats for a downstream dataset.
- Note that the AST needs a low learning rate (the authors use a 10 times smaller learning rate compared to their CNN model proposed in the
[PSLA paper](https://arxiv.org/abs/2102.01243)) and converges quickly, so please search for a suitable learning rate and learning rate scheduler for your task.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/audio_spectogram_transformer_architecture.png"
alt="drawing" width="600"/>
<small> Audio pectrogram Transformer architecture. Taken from the <a href="https://arxiv.org/abs/2104.01778">original paper</a>.</small>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/YuanGongND/ast).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with the Audio Spectrogram Transformer.
@ -72,4 +72,4 @@ If you're interested in submitting a resource to be included here, please feel f
## ASTForAudioClassification
[[autodoc]] ASTForAudioClassification
- forward
- forward

View File

@ -39,13 +39,11 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] AutoformerConfig
## AutoformerModel
[[autodoc]] AutoformerModel
- forward
## AutoformerForPrediction
[[autodoc]] AutoformerForPrediction

View File

@ -14,8 +14,7 @@ specific language governing permissions and limitations under the License.
## Overview
Bark is a transformer-based text-to-speech model proposed by Suno AI in [suno-ai/bark](https://github.com/suno-ai/bark).
Bark is a transformer-based text-to-speech model proposed by Suno AI in [suno-ai/bark](https://github.com/suno-ai/bark).
Bark is made of 4 main models:
@ -26,6 +25,9 @@ Bark is made of 4 main models:
It should be noted that each of the first three modules can support conditional speaker embeddings to condition the output sound according to specific predefined voice.
This model was contributed by [Yoach Lacombe (ylacombe)](https://huggingface.co/ylacombe) and [Sanchit Gandhi (sanchit-gandhi)](https://github.com/sanchit-gandhi).
The original code can be found [here](https://github.com/suno-ai/bark).
### Optimizing Bark
Bark can be optimized with just a few extra lines of code, which **significantly reduces its memory footprint** and **accelerates inference**.
@ -42,7 +44,19 @@ device = "cuda" if torch.cuda.is_available() else "cpu"
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
```
#### Using 🤗 Better Transformer
#### Using CPU offload
As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the submodels from GPU to CPU when they're idle. This operation is called *CPU offloading*. You can use it with one line of code as follows:
```python
model.enable_cpu_offload()
```
Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
#### Using Better Transformer
Better Transformer is an 🤗 Optimum feature that performs kernel fusion under the hood. You can gain 20% to 30% in speed with zero performance degradation. It only requires one line of code to export the model to 🤗 Better Transformer:
@ -52,21 +66,46 @@ model = model.to_bettertransformer()
Note that 🤗 Optimum must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/optimum/installation)
#### Using CPU offload
#### Using Flash Attention 2
As mentioned above, Bark is made up of 4 sub-models, which are called up sequentially during audio generation. In other words, while one sub-model is in use, the other sub-models are idle.
Flash Attention 2 is an even faster, optimized version of the previous optimization.
If you're using a CUDA device, a simple solution to benefit from an 80% reduction in memory footprint is to offload the GPU's submodels when they're idle. This operation is called CPU offloading. You can use it with one line of code.
##### Installation
```python
model.enable_cpu_offload()
First, check whether your hardware is compatible with Flash Attention 2. The latest list of compatible hardware can be found in the [official documentation](https://github.com/Dao-AILab/flash-attention#installation-and-features). If your hardware is not compatible with Flash Attention 2, you can still benefit from attention kernel optimisations through Better Transformer support covered [above](https://huggingface.co/docs/transformers/main/en/model_doc/bark#using-better-transformer).
Next, [install](https://github.com/Dao-AILab/flash-attention#installation-and-features) the latest version of Flash Attention 2:
```bash
pip install -U flash-attn --no-build-isolation
```
Note that 🤗 Accelerate must be installed before using this feature. [Here's how to install it.](https://huggingface.co/docs/accelerate/basic_tutorials/install)
##### Usage
To load a model using Flash Attention 2, we can pass the `use_flash_attention_2` flag to [`.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). We'll also load the model in half-precision (e.g. `torch.float16`), since it results in almost no degradation to audio quality but significantly lower memory usage and faster inference:
```python
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16, use_flash_attention_2=True).to(device)
```
##### Performance comparison
The following diagram shows the latency for the native attention implementation (no optimisation) against Better Transformer and Flash Attention 2. In all cases, we generate 400 semantic tokens on a 40GB A100 GPU with PyTorch 2.1. Flash Attention 2 is also consistently faster than Better Transformer, and its performance improves even more as batch sizes increase:
<div style="text-align: center">
<img src="https://huggingface.co/datasets/ylacombe/benchmark-comparison/resolve/main/Bark%20Optimization%20Benchmark.png">
</div>
To put this into perspective, on an NVIDIA A100 and when generating 400 semantic tokens with a batch size of 16, you can get 17 times the [throughput](https://huggingface.co/blog/optimizing-bark#throughput) and still be 2 seconds faster than generating sentences one by one with the native model implementation. In other words, all the samples will be generated 17 times faster.
At batch size 8, on an NVIDIA A100, Flash Attention 2 is also 10% faster than Better Transformer, and at batch size 16, 25%.
#### Combining optimization techniques
You can combine optimization techniques, and use CPU offload, half-precision and 🤗 Better Transformer all at once.
You can combine optimization techniques, and use CPU offload, half-precision and Flash Attention 2 (or 🤗 Better Transformer) all at once.
```python
from transformers import BarkModel
@ -74,11 +113,8 @@ import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
# load in fp16
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16).to(device)
# convert to bettertransformer
model = BetterTransformer.transform(model, keep_original_model=False)
# load in fp16 and use Flash Attention 2
model = BarkModel.from_pretrained("suno/bark-small", torch_dtype=torch.float16, use_flash_attention_2=True).to(device)
# enable CPU offload
model.enable_cpu_offload()
@ -86,7 +122,7 @@ model.enable_cpu_offload()
Find out more on inference optimization techniques [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one).
### Tips
### Usage tips
Suno offers a library of voice presets in a number of languages [here](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c).
These presets are also uploaded in the hub [here](https://huggingface.co/suno/bark-small/tree/main/speaker_embeddings) or [here](https://huggingface.co/suno/bark/tree/main/speaker_embeddings).
@ -142,11 +178,6 @@ To save the audio, simply take the sample rate from the model config and some sc
>>> write_wav("bark_generation.wav", sample_rate, audio_array)
```
This model was contributed by [Yoach Lacombe (ylacombe)](https://huggingface.co/ylacombe) and [Sanchit Gandhi (sanchit-gandhi)](https://github.com/sanchit-gandhi).
The original code can be found [here](https://github.com/suno-ai/bark).
## BarkConfig
[[autodoc]] BarkConfig

View File

@ -25,9 +25,6 @@ rendered properly in your Markdown viewer.
</a>
</div>
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) and assign
@patrickvonplaten
## Overview
The Bart model was proposed in [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation,
@ -45,7 +42,9 @@ According to the abstract,
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
of up to 6 ROUGE.
Tips:
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
## Usage tips:
- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
@ -57,18 +56,6 @@ Tips:
* permute sentences
* rotate the document to make it start at a specific token
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
### Examples
- Examples and scripts for fine-tuning BART and other models for sequence to sequence tasks can be found in
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).
- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets`
object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904).
- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
## Implementation Notes
- Bart doesn't use `token_type_ids` for sequence classification. Use [`BartTokenizer`] or
@ -112,6 +99,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- [`BartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb).
- [`TFBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb).
- [`FlaxBartForConditionalGeneration`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/flax/summarization).
- An example of how to train [`BartForConditionalGeneration`] with a Hugging Face `datasets` object can be found in this [forum discussion](https://discuss.huggingface.co/t/train-bart-for-conditional-generation-e-g-summarization/1904)
- [Summarization](https://huggingface.co/course/chapter7/5?fw=pt#summarization) chapter of the 🤗 Hugging Face course.
- [Summarization task guide](../tasks/summarization)
@ -134,6 +122,7 @@ See also:
- [Text classification task guide](../tasks/sequence_classification)
- [Question answering task guide](../tasks/question_answering)
- [Causal language modeling task guide](../tasks/language_modeling)
- [Distilled checkpoints](https://huggingface.co/models?search=distilbart) are described in this [paper](https://arxiv.org/abs/2010.13002).
## BartConfig
@ -150,6 +139,10 @@ See also:
[[autodoc]] BartTokenizerFast
- all
<frameworkcontent>
<pt>
## BartModel
[[autodoc]] BartModel
@ -175,6 +168,9 @@ See also:
[[autodoc]] BartForCausalLM
- forward
</pt>
<tf>
## TFBartModel
[[autodoc]] TFBartModel
@ -190,6 +186,9 @@ See also:
[[autodoc]] TFBartForSequenceClassification
- call
</tf>
<jax>
## FlaxBartModel
[[autodoc]] FlaxBartModel
@ -222,3 +221,8 @@ See also:
[[autodoc]] FlaxBartForCausalLM
- __call__
</jax>
</frameworkcontent>

View File

@ -38,8 +38,14 @@ provides a significant boost over vanilla BARThez, and is on par with or outperf
This model was contributed by [moussakam](https://huggingface.co/moussakam). The Authors' code can be found [here](https://github.com/moussaKam/BARThez).
<Tip>
### Examples
BARThez implementation is the same as BART, except for tokenization. Refer to [BART documentation](bart) for information on
configuration classes and their parameters. BARThez-specific tokenizers are documented below.
</Tip>
## Resources
- BARThez can be fine-tuned on sequence-to-sequence tasks in a similar way as BART, check:
[examples/pytorch/summarization/](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization/README.md).

View File

@ -29,7 +29,9 @@ on a downstream task of Vietnamese text summarization show that in both automati
outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future
research and applications of generative Vietnamese NLP tasks.*
Example of use:
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BARTpho).
## Usage example
```python
>>> import torch
@ -54,7 +56,7 @@ Example of use:
>>> features = bartpho(**input_ids)
```
Tips:
## Usage tips
- Following mBART, BARTpho uses the "large" architecture of BART with an additional layer-normalization layer on top of
both the encoder and decoder. Thus, usage examples in the [documentation of BART](bart), when adapting to use
@ -79,8 +81,6 @@ Tips:
Other languages, if employing this pre-trained multilingual SentencePiece model "vocab_file" for subword
segmentation, can reuse BartphoTokenizer with their own language-specialized "monolingual_vocab_file".
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BARTpho).
## BartphoTokenizer
[[autodoc]] BartphoTokenizer

View File

@ -39,7 +39,10 @@ with previous pre-training methods. For example, base-size BEiT achieves 83.2% t
significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains
86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%).*
Tips:
This model was contributed by [nielsr](https://huggingface.co/nielsr). The JAX/FLAX version of this model was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/beit).
## Usage tips
- BEiT models are regular Vision Transformers, but pre-trained in a self-supervised way rather than supervised. They
outperform both the [original model (ViT)](vit) as well as [Data-efficient Image Transformers (DeiT)](deit) when fine-tuned on ImageNet-1K and CIFAR-100. You can check out demo notebooks regarding inference as well as
@ -68,9 +71,6 @@ alt="drawing" width="600"/>
<small> BEiT pre-training. Taken from the <a href="https://arxiv.org/abs/2106.08254">original paper.</a> </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The JAX/FLAX version of this model was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/beit).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BEiT.
@ -107,6 +107,9 @@ If you're interested in submitting a resource to be included here, please feel f
- preprocess
- post_process_semantic_segmentation
<frameworkcontent>
<pt>
## BeitModel
[[autodoc]] BeitModel
@ -127,6 +130,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] BeitForSemanticSegmentation
- forward
</pt>
<jax>
## FlaxBeitModel
[[autodoc]] FlaxBeitModel
@ -141,3 +147,6 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] FlaxBeitForImageClassification
- __call__
</jax>
</frameworkcontent>

View File

@ -33,10 +33,13 @@ GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the
encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation,
Text Summarization, Sentence Splitting, and Sentence Fusion.*
Usage:
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
found [here](https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder).
- The model can be used in combination with the [`EncoderDecoderModel`] to leverage two pretrained
BERT checkpoints for subsequent fine-tuning.
## Usage examples and tips
The model can be used in combination with the [`EncoderDecoderModel`] to leverage two pretrained BERT checkpoints for
subsequent fine-tuning:
```python
>>> # leverage checkpoints for Bert2Bert model...
@ -61,8 +64,7 @@ Usage:
>>> loss.backward()
```
- Pretrained [`EncoderDecoderModel`] are also directly available in the model hub, e.g.,
Pretrained [`EncoderDecoderModel`] are also directly available in the model hub, e.g.:
```python
>>> # instantiate sentence fusion model
@ -85,9 +87,6 @@ Tips:
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
Therefore, no EOS token should be added to the end of the input.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
found [here](https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder).
## BertGenerationConfig
[[autodoc]] BertGenerationConfig

View File

@ -67,12 +67,16 @@ Example of using a model with Character tokenization:
>>> outputs = bertjapanese(**inputs)
```
Tips:
- This implementation is the same as BERT, except for tokenization method. Refer to the [documentation of BERT](bert) for more usage examples.
This model was contributed by [cl-tohoku](https://huggingface.co/cl-tohoku).
<Tip>
This implementation is the same as BERT, except for tokenization method. Refer to [BERT documentation](bert) for
API reference information.
</Tip>
## BertJapaneseTokenizer
[[autodoc]] BertJapaneseTokenizer

View File

@ -45,7 +45,9 @@ language processing tasks, including pushing the GLUE score to 80.5% (7.7% point
accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute
improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).*
Tips:
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/bert).
## Usage tips
- BERT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
@ -59,10 +61,6 @@ Tips:
- The model must predict the original sentence, but has a second objective: inputs are two sentences A and B (with a separation token in between). With probability 50%, the sentences are consecutive in the corpus, in the remaining 50% they are not related. The model has to predict if the sentences are consecutive or not.
This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The original code can be found [here](https://github.com/google-research/bert).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BERT. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
@ -137,14 +135,23 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- create_token_type_ids_from_sequences
- save_vocabulary
<frameworkcontent>
<pt>
## BertTokenizerFast
[[autodoc]] BertTokenizerFast
</pt>
<tf>
## TFBertTokenizer
[[autodoc]] TFBertTokenizer
</tf>
</frameworkcontent>
## Bert specific outputs
[[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput
@ -153,6 +160,10 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput
<frameworkcontent>
<pt>
## BertModel
[[autodoc]] BertModel
@ -198,6 +209,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] BertForQuestionAnswering
- forward
</pt>
<tf>
## TFBertModel
[[autodoc]] TFBertModel
@ -243,6 +257,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] TFBertForQuestionAnswering
- call
</tf>
<jax>
## FlaxBertModel
[[autodoc]] FlaxBertModel
@ -287,3 +304,8 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] FlaxBertForQuestionAnswering
- __call__
</jax>
</frameworkcontent>

View File

@ -28,7 +28,9 @@ al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-
2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks:
Part-of-speech tagging, Named-entity recognition and text classification.*
Example of use:
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BERTweet).
## Usage example
```python
>>> import torch
@ -55,7 +57,12 @@ Example of use:
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
```
This model was contributed by [dqnguyen](https://huggingface.co/dqnguyen). The original code can be found [here](https://github.com/VinAIResearch/BERTweet).
<Tip>
This implementation is the same as BERT, except for tokenization method. Refer to [BERT documentation](bert) for
API reference information.
</Tip>
## BertweetTokenizer

View File

@ -41,7 +41,10 @@ sequence as part of the sparse attention mechanism. The proposed sparse attentio
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*
Tips:
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
[here](https://github.com/google-research/bigbird).
## Usage tips
- For an in-detail explanation on how BigBird's attention works, see [this blog post](https://huggingface.co/blog/big-bird).
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
@ -53,10 +56,8 @@ Tips:
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
[here](https://github.com/google-research/bigbird).
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
@ -85,6 +86,9 @@ This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta
[[autodoc]] models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput
<frameworkcontent>
<pt>
## BigBirdModel
[[autodoc]] BigBirdModel
@ -125,6 +129,9 @@ This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta
[[autodoc]] BigBirdForQuestionAnswering
- forward
</pt>
<jax>
## FlaxBigBirdModel
[[autodoc]] FlaxBigBirdModel
@ -164,3 +171,8 @@ This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta
[[autodoc]] FlaxBigBirdForQuestionAnswering
- __call__
</jax>
</frameworkcontent>

View File

@ -41,7 +41,9 @@ sequence as part of the sparse attention mechanism. The proposed sparse attentio
BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also
propose novel applications to genomics data.*
Tips:
The original code can be found [here](https://github.com/google-research/bigbird).
## Usage tips
- For an in-detail explanation on how BigBird's attention works, see [this blog post](https://huggingface.co/blog/big-bird).
- BigBird comes with 2 implementations: **original_full** & **block_sparse**. For the sequence length < 1024, using
@ -54,9 +56,7 @@ Tips:
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
The original code can be found [here](https://github.com/google-research/bigbird).
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Question answering task guide](../tasks/question_answering)

View File

@ -25,15 +25,15 @@ The abstract from the paper is the following:
*Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.*
Tips:
This model was contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/BioGPT).
- BioGPT is a model with absolute position embeddings so its usually advised to pad the inputs on the right rather than the left.
## Usage tips
- BioGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than the left.
- BioGPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next token in a sequence. Leveraging this feature allows BioGPT to generate syntactically coherent text as it can be observed in the run_generation.py example script.
- The model can take the `past_key_values` (for PyTorch) as input, which is the previously computed key/value attention pairs. Using this (past_key_values or past) value prevents the model from re-computing pre-computed values in the context of text generation. For PyTorch, see past_key_values argument of the BioGptForCausalLM.forward() method for more information on its usage.
This model was contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/BioGPT).
## Documentation resources
## Resources
- [Causal language modeling task guide](../tasks/language_modeling)

View File

@ -25,15 +25,15 @@ The abstract from the paper is the following:
*Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.*
Tips:
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/google-research/big_transfer).
## Usage tips
- BiT models are equivalent to ResNetv2 in terms of architecture, except that: 1) all batch normalization layers are replaced by [group normalization](https://arxiv.org/abs/1803.08494),
2) [weight standardization](https://arxiv.org/abs/1903.10520) is used for convolutional layers. The authors show that the combination of both is useful for training with large batch sizes, and has a significant
impact on transfer learning.
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/google-research/big_transfer).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT.
@ -62,5 +62,4 @@ If you're interested in submitting a resource to be included here, please feel f
## BitForImageClassification
[[autodoc]] BitForImageClassification
- forward
- forward

View File

@ -40,15 +40,16 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
Tips:
- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
found [here](https://github.com/facebookresearch/ParlAI).
## Documentation resources
## Usage tips
Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
## Resources
- [Causal language modeling task guide](../tasks/language_modeling)
- [Translation task guide](../tasks/translation)
@ -70,6 +71,9 @@ found [here](https://github.com/facebookresearch/ParlAI).
[[autodoc]] BlenderbotSmallTokenizerFast
<frameworkcontent>
<pt>
## BlenderbotSmallModel
[[autodoc]] BlenderbotSmallModel
@ -85,6 +89,9 @@ found [here](https://github.com/facebookresearch/ParlAI).
[[autodoc]] BlenderbotSmallForCausalLM
- forward
</pt>
<tf>
## TFBlenderbotSmallModel
[[autodoc]] TFBlenderbotSmallModel
@ -95,6 +102,9 @@ found [here](https://github.com/facebookresearch/ParlAI).
[[autodoc]] TFBlenderbotSmallForConditionalGeneration
- call
</tf>
<jax>
## FlaxBlenderbotSmallModel
[[autodoc]] FlaxBlenderbotSmallModel
@ -108,3 +118,6 @@ found [here](https://github.com/facebookresearch/ParlAI).
- __call__
- encode
- decode
</jax>
</frameworkcontent>

View File

@ -16,8 +16,6 @@ rendered properly in your Markdown viewer.
# Blenderbot
**DISCLAIMER:** If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title) .
## Overview
The Blender chatbot model was proposed in [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf) Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu,
@ -36,26 +34,14 @@ and code publicly available. Human evaluations show our best models are superior
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
failure cases of our models.*
Tips:
- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
the left.
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
## Usage tips and example
## Implementation Notes
Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right
rather than the left.
- Blenderbot uses a standard [seq2seq model transformer](https://arxiv.org/pdf/1706.03762.pdf) based architecture.
- Available checkpoints can be found in the [model hub](https://huggingface.co/models?search=blenderbot).
- This is the *default* Blenderbot model class. However, some smaller checkpoints, such as
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
[BlenderbotSmall](blenderbot-small).
## Usage
Here is an example of model usage:
An example:
```python
>>> from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
@ -70,7 +56,16 @@ Here is an example of model usage:
["<s> That's unfortunate. Are they trying to lose weight or are they just trying to be healthier?</s>"]
```
## Documentation resources
## Implementation Notes
- Blenderbot uses a standard [seq2seq model transformer](https://arxiv.org/pdf/1706.03762.pdf) based architecture.
- Available checkpoints can be found in the [model hub](https://huggingface.co/models?search=blenderbot).
- This is the *default* Blenderbot model class. However, some smaller checkpoints, such as
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
[BlenderbotSmall](blenderbot-small).
## Resources
- [Causal language modeling task guide](../tasks/language_modeling)
- [Translation task guide](../tasks/translation)
@ -90,9 +85,13 @@ Here is an example of model usage:
[[autodoc]] BlenderbotTokenizerFast
- build_inputs_with_special_tokens
<frameworkcontent>
<pt>
## BlenderbotModel
See `transformers.BartModel` for arguments to *forward* and *generate*
See [`~transformers.BartModel`] for arguments to *forward* and *generate*
[[autodoc]] BlenderbotModel
- forward
@ -109,6 +108,9 @@ See [`~transformers.BartForConditionalGeneration`] for arguments to *forward* an
[[autodoc]] BlenderbotForCausalLM
- forward
</pt>
<tf>
## TFBlenderbotModel
[[autodoc]] TFBlenderbotModel
@ -119,6 +121,9 @@ See [`~transformers.BartForConditionalGeneration`] for arguments to *forward* an
[[autodoc]] TFBlenderbotForConditionalGeneration
- call
</tf>
<jax>
## FlaxBlenderbotModel
[[autodoc]] FlaxBlenderbotModel
@ -132,3 +137,8 @@ See [`~transformers.BartForConditionalGeneration`] for arguments to *forward* an
- __call__
- encode
- decode
</jax>
</frameworkcontent>

View File

@ -27,11 +27,6 @@ The abstract from the paper is the following:
*The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. BLIP-2 bridges the modality gap with a lightweight Querying Transformer, which is pre-trained in two stages. The first stage bootstraps vision-language representation learning from a frozen image encoder. The second stage bootstraps vision-to-language generative learning from a frozen language model. BLIP-2 achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods. For example, our model outperforms Flamingo80B by 8.7% on zero-shot VQAv2 with 54x fewer trainable parameters. We also demonstrate the model's emerging capabilities of zero-shot image-to-text generation that can follow natural language instructions.*
Tips:
- BLIP-2 can be used for conditional text generation given an image and an optional text prompt. At inference time, it's recommended to use the [`generate`] method.
- One can use [`Blip2Processor`] to prepare images for the model, and decode the predicted tokens ID's back to text.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/blip2_architecture.jpg"
alt="drawing" width="600"/>
@ -40,6 +35,11 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/salesforce/LAVIS/tree/5ee63d688ba4cebff63acee04adaef2dee9af207).
## Usage tips
- BLIP-2 can be used for conditional text generation given an image and an optional text prompt. At inference time, it's recommended to use the [`generate`] method.
- One can use [`Blip2Processor`] to prepare images for the model, and decode the predicted tokens ID's back to text.
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BLIP-2.

View File

@ -20,7 +20,7 @@ rendered properly in your Markdown viewer.
The BLIP model was proposed in [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://arxiv.org/abs/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi.
BLIP is a model that is able to perform various multi-modal tasks including
BLIP is a model that is able to perform various multi-modal tasks including:
- Visual Question Answering
- Image-Text retrieval (Image-text matching)
- Image Captioning
@ -39,7 +39,6 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
- [Jupyter notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_blip.ipynb) on how to fine-tune BLIP for image captioning on a custom dataset
## BlipConfig
[[autodoc]] BlipConfig
@ -57,12 +56,14 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
[[autodoc]] BlipProcessor
## BlipImageProcessor
[[autodoc]] BlipImageProcessor
- preprocess
<frameworkcontent>
<pt>
## BlipModel
[[autodoc]] BlipModel
@ -75,30 +76,29 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
[[autodoc]] BlipTextModel
- forward
## BlipVisionModel
[[autodoc]] BlipVisionModel
- forward
## BlipForConditionalGeneration
[[autodoc]] BlipForConditionalGeneration
- forward
## BlipForImageTextRetrieval
[[autodoc]] BlipForImageTextRetrieval
- forward
## BlipForQuestionAnswering
[[autodoc]] BlipForQuestionAnswering
- forward
</pt>
<tf>
## TFBlipModel
[[autodoc]] TFBlipModel
@ -111,26 +111,24 @@ The original code can be found [here](https://github.com/salesforce/BLIP).
[[autodoc]] TFBlipTextModel
- call
## TFBlipVisionModel
[[autodoc]] TFBlipVisionModel
- call
## TFBlipForConditionalGeneration
[[autodoc]] TFBlipForConditionalGeneration
- call
## TFBlipForImageTextRetrieval
[[autodoc]] TFBlipForImageTextRetrieval
- call
## TFBlipForQuestionAnswering
[[autodoc]] TFBlipForQuestionAnswering
- call
- call
</tf>
</frameworkcontent>

View File

@ -56,16 +56,20 @@ See also:
[[autodoc]] BloomConfig
- all
## BloomModel
[[autodoc]] BloomModel
- forward
## BloomTokenizerFast
[[autodoc]] BloomTokenizerFast
- all
<frameworkcontent>
<pt>
## BloomModel
[[autodoc]] BloomModel
- forward
## BloomForCausalLM
[[autodoc]] BloomForCausalLM
@ -86,6 +90,9 @@ See also:
[[autodoc]] BloomForQuestionAnswering
- forward
</pt>
<jax>
## FlaxBloomModel
[[autodoc]] FlaxBloomModel
@ -95,3 +102,8 @@ See also:
[[autodoc]] FlaxBloomForCausalLM
- __call__
</jax>
</frameworkcontent>

View File

@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
<Tip warning={true}>
This model is in maintenance mode only, so we won't accept any new PRs changing its code.
This model is in maintenance mode only, we do not accept any new PRs changing its code.
If you run into any issues running this model, please reinstall the last version that supported this model: v4.30.0.
You can do so by running the following command: `pip install -U transformers==4.30.0`.
@ -43,13 +43,15 @@ hardware. It is also 7.9x faster on a CPU, as well as being better performing th
architecture, and some of the non-compressed variants: it obtains performance improvements of between 0.3% and 31%,
absolute, with respect to BERT-large, on multiple public natural language understanding (NLU) benchmarks.*
Tips:
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/alexa/bort/).
- BORT's model architecture is based on BERT, so one can refer to [BERT's documentation page](bert) for the
model's API as well as usage examples.
- BORT uses the RoBERTa tokenizer instead of the BERT tokenizer, so one can refer to [RoBERTa's documentation page](roberta) for the tokenizer's API as well as usage examples.
## Usage tips
- BORT's model architecture is based on BERT, refer to [BERT's documentation page](bert) for the
model's API reference as well as usage examples.
- BORT uses the RoBERTa tokenizer instead of the BERT tokenizer, refer to [RoBERTa's documentation page](roberta) for the tokenizer's API reference as well as usage examples.
- BORT requires a specific fine-tuning algorithm, called [Agora](https://adewynter.github.io/notes/bort_algorithms_and_applications.html#fine-tuning-with-algebraic-topology) ,
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
algorithm to make BORT fine-tuning work.
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/alexa/bort/).

View File

@ -37,7 +37,9 @@ alt="drawing" width="600"/>
<small> BridgeTower architecture. Taken from the <a href="https://arxiv.org/abs/2206.08657">original paper.</a> </small>
## Usage
This model was contributed by [Anahita Bhiwandiwalla](https://huggingface.co/anahita-b), [Tiep Le](https://huggingface.co/Tile) and [Shaoyen Tseng](https://huggingface.co/shaoyent). The original code can be found [here](https://github.com/microsoft/BridgeTower).
## Usage tips and examples
BridgeTower consists of a visual encoder, a textual encoder and cross-modal encoder with multiple lightweight bridge layers.
The goal of this approach was to build a bridge between each uni-modal encoder and the cross-modal encoder to enable comprehensive and detailed interaction at each layer of the cross-modal encoder.
@ -116,9 +118,6 @@ The following example shows how to run masked language modeling using [`BridgeTo
.a cat looking out of the window.
```
This model was contributed by [Anahita Bhiwandiwalla](https://huggingface.co/anahita-b), [Tiep Le](https://huggingface.co/Tile) and [Shaoyen Tseng](https://huggingface.co/shaoyent). The original code can be found [here](https://github.com/microsoft/BridgeTower).
Tips:
- This implementation of BridgeTower uses [`RobertaTokenizer`] to generate text embeddings and OpenAI's CLIP/ViT model to compute visual embeddings.

View File

@ -31,12 +31,13 @@ AMLM is a 2D version of TMLM. It randomly masks text tokens and predicts with th
BROS achieves comparable or better result on Key Information Extraction (KIE) benchmarks such as FUNSD, SROIE, CORD and SciTSR, without relying on explicit visual features.
The abstract from the paper is the following:
*Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE*, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods.*
Tips:
This model was contributed by [jinho8345](https://huggingface.co/jinho8345). The original code can be found [here](https://github.com/clovaai/bros).
## Usage tips and examples
- [`~transformers.BrosModel.forward`] requires `input_ids` and `bbox` (bounding box). Each bounding box should be in (x0, y0, x1, y1) format (top-left corner, bottom-right corner). Obtaining of Bounding boxes depends on external OCR system. The `x` coordinate should be normalized by document image width, and the `y` coordinate should be normalized by document image height.
@ -78,9 +79,9 @@ def make_box_first_token_mask(bboxes, words, tokenizer, max_seq_length=512):
```
- Demo scripts can be found [here](https://github.com/clovaai/bros).
## Resources
This model was contributed by [jinho8345](https://huggingface.co/jinho8345). The original code can be found [here](https://github.com/clovaai/bros).
- Demo scripts can be found [here](https://github.com/clovaai/bros).
## BrosConfig
@ -102,13 +103,11 @@ This model was contributed by [jinho8345](https://huggingface.co/jinho8345). The
[[autodoc]] BrosForTokenClassification
- forward
## BrosSpadeEEForTokenClassification
[[autodoc]] BrosSpadeEEForTokenClassification
- forward
## BrosSpadeELForTokenClassification
[[autodoc]] BrosSpadeELForTokenClassification

View File

@ -40,14 +40,18 @@ experiments.*
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be
found [here](https://github.com/google-research/byt5).
ByT5's architecture is based on the T5v1.1 model, so one can refer to [T5v1.1's documentation page](t5v1.1). They
<Tip>
ByT5's architecture is based on the T5v1.1 model, refer to [T5v1.1's documentation page](t5v1.1) for the API reference. They
only differ in how inputs should be prepared for the model, see the code examples below.
</Tip>
Since ByT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
### Example
## Usage example
ByT5 works on raw UTF-8 bytes, so it can be used without a tokenizer:

View File

@ -34,14 +34,16 @@ dependency parsing, named-entity recognition, and natural language inference. Ca
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and
downstream applications for French NLP.*
Tips:
- This implementation is the same as RoBERTa. Refer to the [documentation of RoBERTa](roberta) for usage examples
as well as the information relative to the inputs and outputs.
This model was contributed by [camembert](https://huggingface.co/camembert). The original code can be found [here](https://camembert-model.fr/).
## Documentation resources
<Tip>
This implementation is the same as RoBERTa. Refer to the [documentation of RoBERTa](roberta) for usage examples as well
as the information relative to the inputs and outputs.
</Tip>
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
@ -66,6 +68,9 @@ This model was contributed by [camembert](https://huggingface.co/camembert). The
[[autodoc]] CamembertTokenizerFast
<frameworkcontent>
<pt>
## CamembertModel
[[autodoc]] CamembertModel
@ -94,6 +99,9 @@ This model was contributed by [camembert](https://huggingface.co/camembert). The
[[autodoc]] CamembertForQuestionAnswering
</pt>
<tf>
## TFCamembertModel
[[autodoc]] TFCamembertModel
@ -121,3 +129,7 @@ This model was contributed by [camembert](https://huggingface.co/camembert). The
## TFCamembertForQuestionAnswering
[[autodoc]] TFCamembertForQuestionAnswering
</tf>
</frameworkcontent>

View File

@ -37,7 +37,9 @@ To use its finer-grained input effectively and efficiently, CANINE combines down
sequence length, with a deep transformer stack, which encodes context. CANINE outperforms a comparable mBERT model by
2.8 F1 on TyDi QA, a challenging multilingual benchmark, despite having 28% fewer model parameters.*
Tips:
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/google-research/language/tree/master/language/canine).
## Usage tips
- CANINE uses no less than 3 Transformer encoders internally: 2 "shallow" encoders (which only consist of a single
layer) and 1 "deep" encoder (which is a regular BERT encoder). First, a "shallow" encoder is used to contextualize
@ -50,19 +52,18 @@ Tips:
(which has a predefined Unicode code point). For token classification tasks however, the downsampled sequence of
tokens needs to be upsampled again to match the length of the original character sequence (which is 2048). The
details for this can be found in the paper.
- Models:
Model checkpoints:
- [google/canine-c](https://huggingface.co/google/canine-c): Pre-trained with autoregressive character loss,
12-layer, 768-hidden, 12-heads, 121M parameters (size ~500 MB).
- [google/canine-s](https://huggingface.co/google/canine-s): Pre-trained with subword loss, 12-layer,
768-hidden, 12-heads, 121M parameters (size ~500 MB).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/google-research/language/tree/master/language/canine).
## Usage example
### Example
CANINE works on raw characters, so it can be used without a tokenizer:
CANINE works on raw characters, so it can be used **without a tokenizer**:
```python
>>> from transformers import CanineModel
@ -96,17 +97,13 @@ sequences to the same length):
>>> sequence_output = outputs.last_hidden_state
```
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
- [Question answering task guide](../tasks/question_answering)
- [Multiple choice task guide](../tasks/multiple_choice)
## CANINE specific outputs
[[autodoc]] models.canine.modeling_canine.CanineModelOutputWithPooling
## CanineConfig
[[autodoc]] CanineConfig
@ -118,6 +115,10 @@ sequences to the same length):
- get_special_tokens_mask
- create_token_type_ids_from_sequences
## CANINE specific outputs
[[autodoc]] models.canine.modeling_canine.CanineModelOutputWithPooling
## CanineModel
[[autodoc]] CanineModel

View File

@ -25,7 +25,9 @@ The abstract from the paper is the following:
*The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining. In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset. We develop 5 Chinese CLIP models of multiple sizes, spanning from 77 to 958 million parameters. Furthermore, we propose a two-stage pretraining method, where the model is first trained with the image encoder frozen and then trained with all parameters being optimized, to achieve enhanced model performance. Our comprehensive experiments demonstrate that Chinese CLIP can achieve the state-of-the-art performance on MUGE, Flickr30K-CN, and COCO-CN in the setups of zero-shot learning and finetuning, and it is able to achieve competitive performance in zero-shot image classification based on the evaluation on the ELEVATER benchmark (Li et al., 2022). Our codes, pretrained models, and demos have been released.*
## Usage
The Chinese-CLIP model was contributed by [OFA-Sys](https://huggingface.co/OFA-Sys).
## Usage example
The code snippet below shows how to compute image & text features and similarities:
@ -59,15 +61,13 @@ The code snippet below shows how to compute image & text features and similariti
>>> probs = logits_per_image.softmax(dim=1) # probs: [[1.2686e-03, 5.4499e-02, 6.7968e-04, 9.4355e-01]]
```
Currently, we release the following scales of pretrained Chinese-CLIP models at HF Model Hub:
Currently, following scales of pretrained Chinese-CLIP models are available on 🤗 Hub:
- [OFA-Sys/chinese-clip-vit-base-patch16](https://huggingface.co/OFA-Sys/chinese-clip-vit-base-patch16)
- [OFA-Sys/chinese-clip-vit-large-patch14](https://huggingface.co/OFA-Sys/chinese-clip-vit-large-patch14)
- [OFA-Sys/chinese-clip-vit-large-patch14-336px](https://huggingface.co/OFA-Sys/chinese-clip-vit-large-patch14-336px)
- [OFA-Sys/chinese-clip-vit-huge-patch14](https://huggingface.co/OFA-Sys/chinese-clip-vit-huge-patch14)
The Chinese-CLIP model was contributed by [OFA-Sys](https://huggingface.co/OFA-Sys).
## ChineseCLIPConfig
[[autodoc]] ChineseCLIPConfig

View File

@ -27,10 +27,9 @@ The abstract from the paper is the following:
*Contrastive learning has shown remarkable success in the field of multimodal representation learning. In this paper, we propose a pipeline of contrastive language-audio pretraining to develop an audio representation by combining audio data with natural language descriptions. To accomplish this target, we first release LAION-Audio-630K, a large collection of 633,526 audio-text pairs from different data sources. Second, we construct a contrastive language-audio pretraining model by considering different audio encoders and text encoders. We incorporate the feature fusion mechanism and keyword-to-caption augmentation into the model design to further enable the model to process audio inputs of variable lengths and enhance the performance. Third, we perform comprehensive experiments to evaluate our model across three tasks: text-to-audio retrieval, zero-shot audio classification, and supervised audio classification. The results demonstrate that our model achieves superior performance in text-to-audio retrieval task. In audio classification tasks, the model achieves state-of-the-art performance in the zeroshot setting and is able to obtain performance comparable to models' results in the non-zero-shot setting. LAION-Audio-6*
This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) and [Arthur Zucker](https://huggingface.co/ArtZucker) .
This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) and [Arthur Zucker](https://huggingface.co/ArthurZ) .
The original code can be found [here](https://github.com/LAION-AI/Clap).
## ClapConfig
[[autodoc]] ClapConfig
@ -78,4 +77,3 @@ The original code can be found [here](https://github.com/LAION-AI/Clap).
[[autodoc]] ClapAudioModelWithProjection
- forward

View File

@ -40,7 +40,9 @@ for any dataset specific training. For instance, we match the accuracy of the or
without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained
model weights at this https URL.*
## Usage
This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).
## Usage tips and example
CLIP is a multi-modal vision and language model. It can be used for image-text similarity and for zero-shot image
classification. CLIP uses a ViT like transformer to get visual features and a causal language model to get the text
@ -77,8 +79,6 @@ encode the text and prepare the images. The following example shows how to get t
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
```
This model was contributed by [valhalla](https://huggingface.co/valhalla). The original code can be found [here](https://github.com/openai/CLIP).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIP.
@ -142,6 +142,9 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] CLIPProcessor
<frameworkcontent>
<pt>
## CLIPModel
[[autodoc]] CLIPModel
@ -164,12 +167,14 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] CLIPVisionModelWithProjection
- forward
## CLIPVisionModel
[[autodoc]] CLIPVisionModel
- forward
</pt>
<tf>
## TFCLIPModel
[[autodoc]] TFCLIPModel
@ -187,6 +192,9 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] TFCLIPVisionModel
- call
</tf>
<jax>
## FlaxCLIPModel
[[autodoc]] FlaxCLIPModel
@ -208,3 +216,6 @@ The resource should ideally demonstrate something new instead of duplicating an
[[autodoc]] FlaxCLIPVisionModel
- __call__
</jax>
</frameworkcontent>

View File

@ -41,13 +41,6 @@ to any binary segmentation task where a text or image query
can be formulated. Finally, we find our system to adapt well
to generalized queries involving affordances or properties*
Tips:
- [`CLIPSegForImageSegmentation`] adds a decoder on top of [`CLIPSegModel`]. The latter is identical to [`CLIPModel`].
- [`CLIPSegForImageSegmentation`] can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text
(provided to the model as `input_ids`) or an image (provided to the model as `conditional_pixel_values`). One can also provide custom
conditional embeddings (provided to the model as `conditional_embeddings`).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/clipseg_architecture.png"
alt="drawing" width="600"/>
@ -56,6 +49,13 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/timojl/clipseg).
## Usage tips
- [`CLIPSegForImageSegmentation`] adds a decoder on top of [`CLIPSegModel`]. The latter is identical to [`CLIPModel`].
- [`CLIPSegForImageSegmentation`] can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text
(provided to the model as `input_ids`) or an image (provided to the model as `conditional_pixel_values`). One can also provide custom
conditional embeddings (provided to the model as `conditional_embeddings`).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CLIPSeg. If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

View File

@ -0,0 +1,126 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# CLVP
## Overview
The CLVP (Contrastive Language-Voice Pretrained Transformer) model was proposed in [Better speech synthesis through scaling](https://arxiv.org/abs/2305.07243) by James Betker.
The abstract from the paper is the following:
*In recent years, the field of image generation has been revolutionized by the application of autoregressive transformers and DDPMs. These approaches model the process of image generation as a step-wise probabilistic processes and leverage large amounts of compute and data to learn the image distribution. This methodology of improving performance need not be confined to images. This paper describes a way to apply advances in the image generative domain to speech synthesis. The result is TorToise - an expressive, multi-voice text-to-speech system.*
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato).
The original code can be found [here](https://github.com/neonbjb/tortoise-tts).
## Usage tips
1. CLVP is an integral part of the Tortoise TTS model.
2. CLVP can be used to compare different generated speech candidates with the provided text, and the best speech tokens are forwarded to the diffusion model.
3. The use of the [`ClvpModelForConditionalGeneration.generate()`] method is strongly recommended for tortoise usage.
4. Note that the CLVP model expects the audio to be sampled at 22.05 kHz contrary to other audio models which expects 16 kHz.
## Brief Explanation:
- The [`ClvpTokenizer`] tokenizes the text input, and the [`ClvpFeatureExtractor`] extracts the log mel-spectrogram from the desired audio.
- [`ClvpConditioningEncoder`] takes those text tokens and audio representations and converts them into embeddings conditioned on the text and audio.
- The [`ClvpForCausalLM`] uses those embeddings to generate multiple speech candidates.
- Each speech candidate is passed through the speech encoder ([`ClvpEncoder`]) which converts them into a vector representation, and the text encoder ([`ClvpEncoder`]) converts the text tokens into the same latent space.
- At the end, we compare each speech vector with the text vector to see which speech vector is most similar to the text vector.
- [`ClvpModelForConditionalGeneration.generate()`] compresses all of the logic described above into a single method.
Example :
```python
>>> import datasets
>>> from transformers import ClvpProcessor, ClvpModelForConditionalGeneration
>>> # Define the Text and Load the Audio (We are taking an audio example from HuggingFace Hub using `datasets` library).
>>> text = "This is an example text."
>>> ds = datasets.load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
>>> ds = ds.cast_column("audio", datasets.Audio(sampling_rate=22050))
>>> sample = ds[0]["audio"]
>>> # Define processor and model.
>>> processor = ClvpProcessor.from_pretrained("susnato/clvp_dev")
>>> model = ClvpModelForConditionalGeneration.from_pretrained("susnato/clvp_dev")
>>> # Generate processor output and model output.
>>> processor_output = processor(raw_speech=sample["array"], sampling_rate=sample["sampling_rate"], text=text, return_tensors="pt")
>>> generated_output = model.generate(**processor_output)
```
## ClvpConfig
[[autodoc]] ClvpConfig
- from_sub_model_configs
## ClvpEncoderConfig
[[autodoc]] ClvpEncoderConfig
## ClvpDecoderConfig
[[autodoc]] ClvpDecoderConfig
## ClvpTokenizer
[[autodoc]] ClvpTokenizer
- save_vocabulary
## ClvpFeatureExtractor
[[autodoc]] ClvpFeatureExtractor
- __call__
## ClvpProcessor
[[autodoc]] ClvpProcessor
- __call__
- decode
- batch_decode
## ClvpModelForConditionalGeneration
[[autodoc]] ClvpModelForConditionalGeneration
- forward
- generate
- get_text_features
- get_speech_features
## ClvpForCausalLM
[[autodoc]] ClvpForCausalLM
## ClvpModel
[[autodoc]] ClvpModel
## ClvpEncoder
[[autodoc]] ClvpEncoder
## ClvpDecoder
[[autodoc]] ClvpDecoder

View File

@ -24,7 +24,11 @@ The abstract from the paper is the following:
*We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use.*
Check out all Code Llama models [here](https://huggingface.co/models?search=code_llama) and the officially released ones in the [codellama org](https://huggingface.co/codellama).
Check out all Code Llama model checkpoints [here](https://huggingface.co/models?search=code_llama) and the officially released ones in the [codellama org](https://huggingface.co/codellama).
This model was contributed by [ArthurZucker](https://huggingface.co/ArthurZ). The original code of the authors can be found [here](https://github.com/facebookresearch/llama).
## Usage tips and examples
<Tip warning={true}>
@ -38,21 +42,22 @@ As mentioned above, the `dtype` of the storage weights is mostly irrelevant unle
</Tip>
Tips:
- These models have the same architecture as the `Llama2` models
Tips:
- The infilling task is supported out of the box. You should be using the `tokenizer.fill_token` where you want your input to be filled.
- The model conversion script is the same as for the `Llama2` family:
Here is a sample usage
Here is a sample usage:
```bash
python src/transformers/models/llama/convert_llama_weights_to_hf.py \
--input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
```
Note that executing the script requires enough CPU RAM to host the whole model in float16 precision (even if the biggest versions
come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM).
- After conversion, the model and tokenizer can be loaded via:
After conversion, the model and tokenizer can be loaded via:
```python
>>> from transformers import LlamaForCausalLM, CodeLlamaTokenizer
@ -95,9 +100,13 @@ If you only want the infilled part:
Under the hood, the tokenizer [automatically splits by `<FILL_ME>`](https://huggingface.co/docs/transformers/main/model_doc/code_llama#transformers.CodeLlamaTokenizer.fill_token) to create a formatted input string that follows [the original training pattern](https://github.com/facebookresearch/codellama/blob/cb51c14ec761370ba2e2bc351374a79265d0465e/llama/generation.py#L402). This is more robust than preparing the pattern yourself: it avoids pitfalls, such as token glueing, that are very hard to debug. To see how much CPU and GPU memory you need for this model or others, try [this calculator](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) which can help determine that value.
- The LLaMA tokenizer is a BPE model based on [sentencepiece](https://github.com/google/sentencepiece). One quirk of sentencepiece is that when decoding a sequence, if the first token is the start of the word (e.g. "Banana"), the tokenizer does not prepend the prefix space to the string.
The LLaMA tokenizer is a BPE model based on [sentencepiece](https://github.com/google/sentencepiece). One quirk of sentencepiece is that when decoding a sequence, if the first token is the start of the word (e.g. "Banana"), the tokenizer does not prepend the prefix space to the string.
This model was contributed by [ArthurZucker](https://huggingface.co/ArthurZ). The original code of the authors can be found [here](https://github.com/facebookresearch/llama).
<Tip>
Code Llama has the same architecture as the `Llama2` models, refer to [Llama2's documentation page](llama2) for the API reference.
Find Code Llama tokenizer reference below.
</Tip>
## CodeLlamaTokenizer

View File

@ -40,7 +40,7 @@ The original code can be found [here](https://github.com/salesforce/codegen).
* `mono`: Initialized with `multi`, then further pre-trained on Python data
* For example, `Salesforce/codegen-350M-mono` offers a 350 million-parameter checkpoint pre-trained sequentially on the Pile, multiple programming languages, and Python.
## How to use
## Usage example
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
@ -60,7 +60,7 @@ def hello_world():
hello_world()
```
## Documentation resources
## Resources
- [Causal language modeling task guide](../tasks/language_modeling)

View File

@ -31,7 +31,7 @@ alt="drawing" width="600"/>
This model was contributed by [DepuMeng](https://huggingface.co/DepuMeng). The original code can be found [here](https://github.com/Atten4Vis/ConditionalDETR).
## Documentation resources
## Resources
- [Object detection task guide](../tasks/object_detection)

View File

@ -44,12 +44,14 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
using less than 1/4 training cost. Code and pre-trained models will be released.*
ConvBERT training tips are similar to those of BERT.
This model was contributed by [abhishek](https://huggingface.co/abhishek). The original implementation can be found
here: https://github.com/yitu-opensource/ConvBert
## Documentation resources
## Usage tips
ConvBERT training tips are similar to those of BERT. For usage tips refer to [BERT documentation](bert).
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
@ -73,6 +75,9 @@ here: https://github.com/yitu-opensource/ConvBert
[[autodoc]] ConvBertTokenizerFast
<frameworkcontent>
<pt>
## ConvBertModel
[[autodoc]] ConvBertModel
@ -103,6 +108,9 @@ here: https://github.com/yitu-opensource/ConvBert
[[autodoc]] ConvBertForQuestionAnswering
- forward
</pt>
<tf>
## TFConvBertModel
[[autodoc]] TFConvBertModel
@ -132,3 +140,6 @@ here: https://github.com/yitu-opensource/ConvBert
[[autodoc]] TFConvBertForQuestionAnswering
- call
</tf>
</frameworkcontent>

View File

@ -32,10 +32,6 @@ of a vision Transformer, and discover several key components that contribute to
dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy
and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.*
Tips:
- See the code examples below each model regarding usage.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/convnext_architecture.jpg"
alt="drawing" width="600"/>
@ -68,6 +64,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] ConvNextImageProcessor
- preprocess
<frameworkcontent>
<pt>
## ConvNextModel
[[autodoc]] ConvNextModel
@ -78,14 +77,18 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] ConvNextForImageClassification
- forward
</pt>
<tf>
## TFConvNextModel
[[autodoc]] TFConvNextModel
- call
## TFConvNextForImageClassification
[[autodoc]] TFConvNextForImageClassification
- call
</tf>
</frameworkcontent>

View File

@ -25,10 +25,6 @@ The abstract from the paper is the following:
*Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.*
Tips:
- See the code examples below each model regarding usage.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/convnextv2_architecture.png"
alt="drawing" width="600"/>

View File

@ -37,7 +37,14 @@ NLP tasks in the settings of few-shot (even zero-shot) learning.*
This model was contributed by [canwenxu](https://huggingface.co/canwenxu). The original implementation can be found
here: https://github.com/TsinghuaAI/CPM-Generate
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
<Tip>
CPM's architecture is the same as GPT-2, except for tokenization method. Refer to [GPT-2 documentation](gpt2) for
API reference information.
</Tip>
## CpmTokenizer

View File

@ -20,11 +20,10 @@ rendered properly in your Markdown viewer.
CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. It is also the first milestone of the live training process of CPM-Live. The training process is cost-effective and environment-friendly. CPM-Ant also achieves promising results with delta tuning on the CUGE benchmark. Besides the full model, we also provide various compressed versions to meet the requirements of different hardware configurations. [See more](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live)
Tips:
This model was contributed by [OpenBMB](https://huggingface.co/openbmb). The original code can be found [here](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live).
⚙️ Training & Inference
## Resources
- A tutorial on [CPM-Live](https://github.com/OpenBMB/CPM-Live/tree/cpm-ant/cpm-live).
## CpmAntConfig

View File

@ -41,7 +41,10 @@ providing more explicit control over text generation. These codes also allow CTR
training data are most likely given a sequence. This provides a potential method for analyzing large amounts of data
via model-based source attribution.*
Tips:
This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitishr). The original code can be found
[here](https://github.com/salesforce/ctrl).
## Usage tips
- CTRL makes use of control codes to generate text: it requires generations to be started by certain words, sentences
or links to generate coherent text. Refer to the [original implementation](https://github.com/salesforce/ctrl) for
@ -56,10 +59,8 @@ Tips:
pre-computed values in the context of text generation. See the [`forward`](model_doc/ctrl#transformers.CTRLModel.forward)
method for more information on the usage of this argument.
This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitishr). The original code can be found
[here](https://github.com/salesforce/ctrl).
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Causal language modeling task guide](../tasks/language_modeling)
@ -73,6 +74,9 @@ This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitis
[[autodoc]] CTRLTokenizer
- save_vocabulary
<frameworkcontent>
<pt>
## CTRLModel
[[autodoc]] CTRLModel
@ -88,6 +92,9 @@ This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitis
[[autodoc]] CTRLForSequenceClassification
- forward
</pt>
<tf>
## TFCTRLModel
[[autodoc]] TFCTRLModel
@ -102,3 +109,6 @@ This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitis
[[autodoc]] TFCTRLForSequenceClassification
- call
</tf>
</frameworkcontent>

View File

@ -33,15 +33,15 @@ performance gains are maintained when pretrained on larger datasets (\eg ImageNe
ImageNet-22k, our CvT-W24 obtains a top-1 accuracy of 87.7\% on the ImageNet-1k val set. Finally, our results show that the positional encoding,
a crucial component in existing Vision Transformers, can be safely removed in our model, simplifying the design for higher resolution vision tasks.*
Tips:
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/microsoft/CvT).
## Usage tips
- CvT models are regular Vision Transformers, but trained with convolutions. They outperform the [original model (ViT)](vit) when fine-tuned on ImageNet-1K and CIFAR-100.
- You can check out demo notebooks regarding inference as well as fine-tuning on custom data [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer) (you can just replace [`ViTFeatureExtractor`] by [`AutoImageProcessor`] and [`ViTForImageClassification`] by [`CvtForImageClassification`]).
- The available checkpoints are either (1) pre-trained on [ImageNet-22k](http://www.image-net.org/) (a collection of 14 million images and 22k classes) only, (2) also fine-tuned on ImageNet-22k or (3) also fine-tuned on [ImageNet-1k](http://www.image-net.org/challenges/LSVRC/2012/) (also referred to as ILSVRC 2012, a collection of 1.3 million
images and 1,000 classes).
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/microsoft/CvT).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with CvT.
@ -57,6 +57,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] CvtConfig
<frameworkcontent>
<pt>
## CvtModel
[[autodoc]] CvtModel
@ -67,6 +70,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] CvtForImageClassification
- forward
</pt>
<tf>
## TFCvtModel
[[autodoc]] TFCvtModel
@ -77,3 +83,5 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] TFCvtForImageClassification
- call
</tf>
</frameworkcontent>

View File

@ -35,19 +35,18 @@ the entire input. Experiments on the major benchmarks of speech recognition, ima
natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
Models and code are available at www.github.com/pytorch/fairseq/tree/master/examples/data2vec.*
Tips:
- Data2VecAudio, Data2VecText, and Data2VecVision have all been trained using the same self-supervised learning method.
- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
This model was contributed by [edugp](https://huggingface.co/edugp) and [patrickvonplaten](https://huggingface.co/patrickvonplaten).
[sayakpaul](https://github.com/sayakpaul) and [Rocketknight1](https://github.com/Rocketknight1) contributed Data2Vec for vision in TensorFlow.
The original code (for NLP and Speech) can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/data2vec).
The original code for vision can be found [here](https://github.com/facebookresearch/data2vec_vision/tree/main/beit).
## Usage tips
- Data2VecAudio, Data2VecText, and Data2VecVision have all been trained using the same self-supervised learning method.
- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
## Resources
@ -88,6 +87,8 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] Data2VecVisionConfig
<frameworkcontent>
<pt>
## Data2VecAudioModel
@ -164,6 +165,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] Data2VecVisionForSemanticSegmentation
- forward
</pt>
<tf>
## TFData2VecVisionModel
[[autodoc]] TFData2VecVisionModel
@ -178,3 +182,6 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] TFData2VecVisionForSemanticSegmentation
- call
</tf>
</frameworkcontent>

View File

@ -62,7 +62,7 @@ New in v2:
This model was contributed by [DeBERTa](https://huggingface.co/DeBERTa). This model TF 2.0 implementation was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/microsoft/DeBERTa).
## Documentation resources
## Resources
- [Text classification task guide](../tasks/sequence_classification)
- [Token classification task guide](../tasks/token_classification)
@ -88,6 +88,9 @@ contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code
- build_inputs_with_special_tokens
- create_token_type_ids_from_sequences
<frameworkcontent>
<pt>
## DebertaV2Model
[[autodoc]] DebertaV2Model
@ -123,6 +126,9 @@ contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code
[[autodoc]] DebertaV2ForMultipleChoice
- forward
</pt>
<tf>
## TFDebertaV2Model
[[autodoc]] TFDebertaV2Model
@ -157,3 +163,6 @@ contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code
[[autodoc]] TFDebertaV2ForMultipleChoice
- call
</tf>
</frameworkcontent>

View File

@ -94,6 +94,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- build_inputs_with_special_tokens
- create_token_type_ids_from_sequences
<frameworkcontent>
<pt>
## DebertaModel
[[autodoc]] DebertaModel
@ -123,6 +126,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] DebertaForQuestionAnswering
- forward
</pt>
<tf>
## TFDebertaModel
[[autodoc]] TFDebertaModel
@ -152,3 +158,7 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] TFDebertaForQuestionAnswering
- call
</tf>
</frameworkcontent>

View File

@ -33,9 +33,7 @@ This allows us to draw upon the simplicity and scalability of the Transformer ar
Decision Transformer matches or exceeds the performance of state-of-the-art model-free offline RL baselines on
Atari, OpenAI Gym, and Key-to-Door tasks.*
Tips:
This version of the model is for tasks where the state is a vector, image-based states will come soon.
This version of the model is for tasks where the state is a vector.
This model was contributed by [edbeeching](https://huggingface.co/edbeeching). The original code can be found [here](https://github.com/kzl/decision-transformer).

View File

@ -25,11 +25,6 @@ The abstract from the paper is the following:
*DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach.*
Tips:
- One can use [`DeformableDetrImageProcessor`] to prepare images (and optional targets) for the model.
- Training Deformable DETR is equivalent to training the original [DETR](detr) model. See the [resources](#resources) section below for demo notebooks.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/deformable_detr_architecture.png"
alt="drawing" width="600"/>
@ -37,6 +32,10 @@ alt="drawing" width="600"/>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/fundamentalvision/Deformable-DETR).
## Usage tips
- Training Deformable DETR is equivalent to training the original [DETR](detr) model. See the [resources](#resources) section below for demo notebooks.
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Deformable DETR.

View File

@ -16,13 +16,6 @@ rendered properly in your Markdown viewer.
# DeiT
<Tip>
This is a recently introduced model so the API hasn't been tested extensively. There may be some bugs or slight
breaking changes to fix it in the future. If you see something strange, file a [Github Issue](https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title).
</Tip>
## Overview
The DeiT model was proposed in [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
@ -45,7 +38,9 @@ distillation, especially when using a convnet as a teacher. This leads us to rep
for both Imagenet (where we obtain up to 85.2% accuracy) and when transferring to other tasks. We share our code and
models.*
Tips:
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts).
## Usage tips
- Compared to ViT, DeiT models use a so-called distillation token to effectively learn from a teacher (which, in the
DeiT paper, is a ResNet like-model). The distillation token is learned through backpropagation, by interacting with
@ -73,8 +68,6 @@ Tips:
*facebook/deit-base-patch16-384*. Note that one should use [`DeiTImageProcessor`] in order to
prepare images for the model.
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DeiT.
@ -104,6 +97,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] DeiTImageProcessor
- preprocess
<frameworkcontent>
<pt>
## DeiTModel
[[autodoc]] DeiTModel
@ -124,6 +120,9 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] DeiTForImageClassificationWithTeacher
- forward
</pt>
<tf>
## TFDeiTModel
[[autodoc]] TFDeiTModel
@ -143,3 +142,6 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] TFDeiTForImageClassificationWithTeacher
- call
</tf>
</frameworkcontent>

View File

@ -24,12 +24,10 @@ The abstract of the paper states the following:
*Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.*
## Model description
DePlot is a model that is trained using `Pix2Struct` architecture. You can find more information about `Pix2Struct` in the [Pix2Struct documentation](https://huggingface.co/docs/transformers/main/en/model_doc/pix2struct).
DePlot is a Visual Question Answering subset of `Pix2Struct` architecture. It renders the input question on the image and predicts the answer.
## Usage
## Usage example
Currently one checkpoint is available for DePlot:
@ -59,4 +57,10 @@ from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup
optimizer = Adafactor(self.parameters(), scale_parameter=False, relative_step=False, lr=0.01, weight_decay=1e-05)
scheduler = get_cosine_schedule_with_warmup(optimizer, num_warmup_steps=1000, num_training_steps=40000)
```
```
<Tip>
DePlot is a model trained using `Pix2Struct` architecture. For API reference, see [`Pix2Struct` documentation](pix2struct).
</Tip>

View File

@ -26,10 +26,6 @@ The abstract from the paper is the following:
*Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture.*
Tips:
- One can use [`DetaImageProcessor`] to prepare images and optional targets for the model.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/deta_architecture.jpg"
alt="drawing" width="600"/>
@ -51,20 +47,17 @@ If you're interested in submitting a resource to be included here, please feel f
[[autodoc]] DetaConfig
## DetaImageProcessor
[[autodoc]] DetaImageProcessor
- preprocess
- post_process_object_detection
## DetaModel
[[autodoc]] DetaModel
- forward
## DetaForObjectDetection
[[autodoc]] DetaForObjectDetection

View File

@ -41,6 +41,8 @@ baselines.*
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/facebookresearch/detr).
## How DETR works
Here's a TLDR explaining how [`~transformers.DetrForObjectDetection`] works:
First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use
@ -79,7 +81,7 @@ where one first trains a [`~transformers.DetrForObjectDetection`] model to detec
the mask head for 25 epochs. Experimentally, these two approaches give similar results. Note that predicting boxes is
required for the training to be possible, since the Hungarian matching is computed using distances between boxes.
Tips:
## Usage tips
- DETR uses so-called **object queries** to detect objects in an image. The number of queries determines the maximum
number of objects that can be detected in a single image, and is set to 100 by default (see parameter
@ -165,14 +167,6 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
## DETR specific outputs
[[autodoc]] models.detr.modeling_detr.DetrModelOutput
[[autodoc]] models.detr.modeling_detr.DetrObjectDetectionOutput
[[autodoc]] models.detr.modeling_detr.DetrSegmentationOutput
## DetrConfig
[[autodoc]] DetrConfig
@ -195,6 +189,14 @@ If you're interested in submitting a resource to be included here, please feel f
- post_process_instance_segmentation
- post_process_panoptic_segmentation
## DETR specific outputs
[[autodoc]] models.detr.modeling_detr.DetrModelOutput
[[autodoc]] models.detr.modeling_detr.DetrObjectDetectionOutput
[[autodoc]] models.detr.modeling_detr.DetrSegmentationOutput
## DetrModel
[[autodoc]] DetrModel

View File

@ -32,7 +32,9 @@ that leverage DialoGPT generate more relevant, contentful and context-consistent
systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response
generation and the development of more intelligent open-domain dialogue systems.*
Tips:
The original code can be found [here](https://github.com/microsoft/DialoGPT).
## Usage tips
- DialoGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather
than the left.
@ -47,7 +49,8 @@ follow the OpenAI GPT-2 to model a multiturn dialogue session as a long text and
modeling. We first concatenate all dialog turns within a dialogue session into a long text x_1,..., x_N (N is the
sequence length), ended by the end-of-text token.* For more information please confer to the original paper.
<Tip>
DialoGPT's architecture is based on the GPT2 model, so one can refer to [GPT2's documentation page](gpt2).
DialoGPT's architecture is based on the GPT2 model, refer to [GPT2's documentation page](gpt2) for API reference and examples.
The original code can be found [here](https://github.com/microsoft/DialoGPT).
</Tip>

View File

@ -44,17 +44,6 @@ and ADE20K (48.5 PQ), and instance segmentation model on Cityscapes (44.5 AP) an
It also matches the state of the art specialized semantic segmentation models on ADE20K (58.2 mIoU),
and ranks second on Cityscapes (84.5 mIoU) (no extra data). *
Tips:
- One can use the [`AutoImageProcessor`] API to prepare images for the model.
- DiNAT can be used as a *backbone*. When `output_hidden_states = True`,
it will output both `hidden_states` and `reshaped_hidden_states`. The `reshaped_hidden_states` have a shape of `(batch, num_channels, height, width)` rather than `(batch_size, height, width, num_channels)`.
Notes:
- DiNAT depends on [NATTEN](https://github.com/SHI-Labs/NATTEN/)'s implementation of Neighborhood Attention and Dilated Neighborhood Attention.
You can install it with pre-built wheels for Linux by referring to [shi-labs.com/natten](https://shi-labs.com/natten), or build on your system by running `pip install natten`.
Note that the latter will likely take time to compile. NATTEN does not support Windows devices yet.
- Patch size of 4 is only supported at the moment.
<img
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/dilated-neighborhood-attention-pattern.jpg"
alt="drawing" width="600"/>
@ -65,6 +54,17 @@ Taken from the <a href="https://arxiv.org/abs/2209.15001">original paper</a>.</s
This model was contributed by [Ali Hassani](https://huggingface.co/alihassanijr).
The original code can be found [here](https://github.com/SHI-Labs/Neighborhood-Attention-Transformer).
## Usage tips
DiNAT can be used as a *backbone*. When `output_hidden_states = True`,
it will output both `hidden_states` and `reshaped_hidden_states`. The `reshaped_hidden_states` have a shape of `(batch, num_channels, height, width)` rather than `(batch_size, height, width, num_channels)`.
Notes:
- DiNAT depends on [NATTEN](https://github.com/SHI-Labs/NATTEN/)'s implementation of Neighborhood Attention and Dilated Neighborhood Attention.
You can install it with pre-built wheels for Linux by referring to [shi-labs.com/natten](https://shi-labs.com/natten), or build on your system by running `pip install natten`.
Note that the latter will likely take time to compile. NATTEN does not support Windows devices yet.
- Patch size of 4 is only supported at the moment.
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DiNAT.

View File

@ -22,14 +22,9 @@ The abstract from the paper is the following:
*The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.*
Tips:
- One can use [`AutoImageProcessor`] class to prepare images for the model.
This model was contributed by [nielsr](https://huggingface.co/nielsr).
The original code can be found [here](https://github.com/facebookresearch/dinov2).
## Dinov2Config
[[autodoc]] Dinov2Config

View File

@ -51,7 +51,10 @@ distillation and cosine-distance losses. Our smaller, faster and lighter model i
demonstrate its capabilities for on-device computations in a proof-of-concept experiment and a comparative on-device
study.*
Tips:
This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
## Usage tips
- DistilBERT doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just
separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`).
@ -63,8 +66,6 @@ Tips:
* predicting the masked tokens correctly (but no next-sentence objective)
* a cosine similarity between the hidden states of the student and the teacher model
This model was contributed by [victorsanh](https://huggingface.co/victorsanh). This model jax version was
contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation).
## Resources
@ -132,6 +133,37 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- A blog post on how to [deploy DistilBERT with Amazon SageMaker](https://huggingface.co/blog/deploy-hugging-face-models-easily-with-amazon-sagemaker).
- A blog post on how to [Deploy BERT with Hugging Face Transformers, Amazon SageMaker and Terraform module](https://www.philschmid.de/terraform-huggingface-amazon-sagemaker).
## Combining DistilBERT and Flash Attention 2
First, make sure to install the latest version of Flash Attention 2 to include the sliding window attention feature.
```bash
pip install -U flash-attn --no-build-isolation
```
Make also sure that you have a hardware that is compatible with Flash-Attention 2. Read more about it in the official documentation of flash-attn repository. Make also sure to load your model in half-precision (e.g. `torch.float16`)
To load and run a model using Flash Attention 2, refer to the snippet below:
```python
>>> import torch
>>> from transformers import AutoTokenizer, AutoModel
>>> device = "cuda" # the device to load the model onto
>>> tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
>>> model = AutoModel.from_pretrained("distilbert-base-uncased", torch_dtype=torch.float16, use_flash_attention_2=True)
>>> text = "Replace me by any text you'd like."
>>> encoded_input = tokenizer(text, return_tensors='pt').to(device)
>>> model.to(device)
>>> output = model(**encoded_input)
```
## DistilBertConfig
[[autodoc]] DistilBertConfig
@ -144,6 +176,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] DistilBertTokenizerFast
<frameworkcontent>
<pt>
## DistilBertModel
[[autodoc]] DistilBertModel
@ -174,6 +209,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] DistilBertForQuestionAnswering
- forward
</pt>
<tf>
## TFDistilBertModel
[[autodoc]] TFDistilBertModel
@ -204,6 +242,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] TFDistilBertForQuestionAnswering
- call
</tf>
<jax>
## FlaxDistilBertModel
[[autodoc]] FlaxDistilBertModel
@ -233,3 +274,10 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] FlaxDistilBertForQuestionAnswering
- __call__
</jax>
</frameworkcontent>

View File

@ -37,6 +37,10 @@ alt="drawing" width="600"/>
<small> Summary of the approach. Taken from the [original paper](https://arxiv.org/abs/2203.02378). </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/dit).
## Usage tips
One can directly use the weights of DiT with the AutoModel API:
```python
@ -66,10 +70,6 @@ model = AutoModelForImageClassification.from_pretrained("microsoft/dit-base-fine
This particular checkpoint was fine-tuned on [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/), an important benchmark for document image classification.
A notebook that illustrates inference for document image classification can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DiT/Inference_with_DiT_(Document_Image_Transformer)_for_document_image_classification.ipynb).
As DiT's architecture is equivalent to that of BEiT, one can refer to [BEiT's documentation page](beit) for all tips, code examples and notebooks.
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/dit).
## Resources
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with DiT.
@ -78,4 +78,9 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
- [`BeitForImageClassification`] is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb).
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.
<Tip>
As DiT's architecture is equivalent to that of BEiT, one can refer to [BEiT's documentation page](beit) for all tips, code examples and notebooks.
</Tip>

Some files were not shown because too many files have changed in this diff Show More