mirror of
https://github.com/huggingface/transformers.git
synced 2025-11-02 10:34:36 +08:00
Compare commits
862 Commits
v4.9.2-rel
...
v4.14.1
| Author | SHA1 | Date | |
|---|---|---|---|
| 19e5ed7366 | |||
| d6698ecfc3 | |||
| 960d8cb41d | |||
| aece7badc1 | |||
| 50bc57cef8 | |||
| 48d4827697 | |||
| 72c6e8b8bf | |||
| a94105f95f | |||
| 7e61d56a45 | |||
| fdf3ce2827 | |||
| 851a78978a | |||
| e7ed7ffdcb | |||
| 2a606f9974 | |||
| 546a91abe9 | |||
| 322d416916 | |||
| 7533d30acd | |||
| e926ea2bdd | |||
| 971e36667a | |||
| 15a9d01519 | |||
| 6ac0fac85a | |||
| e4666bff06 | |||
| 64e92ed224 | |||
| c3cd88a9ba | |||
| 12d9b95723 | |||
| ca0b82bbd7 | |||
| fc74c84537 | |||
| 8362d07d63 | |||
| 6e05bb1c96 | |||
| c17e7cde32 | |||
| 3d66146afc | |||
| 4c99e553c1 | |||
| 32eb29fef9 | |||
| 48bf7e47a0 | |||
| 91f3dfbfdd | |||
| 86dd23bb8b | |||
| 6a025487a6 | |||
| 027074f4d0 | |||
| 7cb1fdd4d1 | |||
| 39fbb068be | |||
| 5eca742f6c | |||
| 63c284c2d4 | |||
| f46668282b | |||
| 3b2d1652e4 | |||
| 1b75d7238c | |||
| ae82ee6a48 | |||
| 5b00400198 | |||
| 59d684fa92 | |||
| 8395f14de6 | |||
| bab1556456 | |||
| 3bc7d70e9c | |||
| 4701a1a182 | |||
| ab31b3e41b | |||
| 4da3a696e4 | |||
| 60be4bf8ac | |||
| da7aabf2ca | |||
| 7375758bee | |||
| 68e53e6fcd | |||
| e9800122a6 | |||
| ee6674d450 | |||
| e6219320b9 | |||
| 13186d7152 | |||
| d104dd46d9 | |||
| 1228661285 | |||
| 16870d114b | |||
| 01b8cd5932 | |||
| f6b87c5f30 | |||
| fe06f8dcac | |||
| 4ea19de80c | |||
| cf36f4d7a8 | |||
| 0f4e39c559 | |||
| 0c70f145d1 | |||
| ee4fa2e465 | |||
| 2294071a0c | |||
| fab3b518ef | |||
| 65b20b739b | |||
| 961732c276 | |||
| 2e12d90b9e | |||
| fae0b9faef | |||
| 39f1dff5a0 | |||
| 2171695cc2 | |||
| b66c5ab20c | |||
| 30646a0a3c | |||
| 4cdb67caba | |||
| 381b05a3f5 | |||
| 75ae287aec | |||
| 03fda7b743 | |||
| e513c16e82 | |||
| e9688875bf | |||
| cbe6026536 | |||
| df085d8ea8 | |||
| 0f3f045ebd | |||
| 1ccc033c56 | |||
| 6cdc3a7844 | |||
| c824d7ed48 | |||
| bc8a9f415b | |||
| c5bd732ac6 | |||
| 803a8cd18f | |||
| 3977b58437 | |||
| 73ec4340ec | |||
| 71b1bf7ea8 | |||
| aada989ad5 | |||
| ec47baeba2 | |||
| e62091d5a7 | |||
| 66ea739168 | |||
| 6645eb61fa | |||
| 96cc02b51b | |||
| 43f953cc2e | |||
| e4c67d60ec | |||
| 50d909be28 | |||
| 77d87e732e | |||
| 275402bf2b | |||
| 4f68de625c | |||
| fbe278c76c | |||
| 934e2799da | |||
| 4df7d05a87 | |||
| 14cc50d081 | |||
| 4c0dd199c8 | |||
| 70996a5420 | |||
| fc1d97f29d | |||
| 6ed9882ddb | |||
| 4176bc161c | |||
| faacd74729 | |||
| 254fef67cf | |||
| c468a87a69 | |||
| 6fc38adff2 | |||
| 8332327dca | |||
| 2bd950ca47 | |||
| cea17acd8c | |||
| 25156eb296 | |||
| 4ee0b755bd | |||
| ebbe8cc3fe | |||
| 69511cdcae | |||
| 2318bf77eb | |||
| c15f4f203f | |||
| 1bbd6fcdeb | |||
| 04683c0659 | |||
| d1fd64e7aa | |||
| 3772af49ce | |||
| f2e90bcb8f | |||
| 6c4d688ffa | |||
| 956a483173 | |||
| 69e16abf98 | |||
| 0b7d053c13 | |||
| 204d251310 | |||
| 11f65d4158 | |||
| e0e2da1194 | |||
| a4553e6c64 | |||
| 1a92bc5788 | |||
| c9d2cf855a | |||
| a59e7c1ed4 | |||
| 81fe8afaac | |||
| f25a9332e8 | |||
| 0490b98877 | |||
| 331c3d2aa0 | |||
| efea0f868b | |||
| 72a6bf33c0 | |||
| 83ef8bcac2 | |||
| da36c557f7 | |||
| d83b0e0c07 | |||
| 08816de16a | |||
| 01f8e639d3 | |||
| 1991da07f7 | |||
| 754202de4f | |||
| 7544efc92e | |||
| c6c075544d | |||
| a2864a50e7 | |||
| 700a748fe6 | |||
| b567510cff | |||
| 888fb21159 | |||
| a33168aa78 | |||
| 3e8d17e66d | |||
| 040fd47162 | |||
| 1cc453d33c | |||
| 76d0d41e51 | |||
| 9fd937ead1 | |||
| a67d47b40c | |||
| 74e6111ba7 | |||
| 4ce74edf51 | |||
| 267867e851 | |||
| 29dfb2dbb1 | |||
| 790cdc2e55 | |||
| 2e60276b38 | |||
| 3165930402 | |||
| 280a811ecb | |||
| 55f49c5f4b | |||
| 21546e59a6 | |||
| ed5d15518b | |||
| 77262ef750 | |||
| 3d607df8f4 | |||
| 7f20bf0d43 | |||
| 4c35c8d89c | |||
| b1dbdf22ef | |||
| e92190c0f8 | |||
| 1c76a51615 | |||
| 9e37c5cdf8 | |||
| bec02ff209 | |||
| 2b0d9389f8 | |||
| ea163d0948 | |||
| 5c153079e2 | |||
| 321eb56222 | |||
| 46d0cdae40 | |||
| e81d8d7fa9 | |||
| e4d8f517b9 | |||
| 85a4bda4f4 | |||
| babd0b9a5e | |||
| 4f24058c58 | |||
| be4a6c64dc | |||
| 6326aa4bf0 | |||
| 95b3ec3bc9 | |||
| a503012275 | |||
| d0e96c6de6 | |||
| dfb00bf644 | |||
| de635af3f1 | |||
| a3ded170e2 | |||
| 9b78b070ef | |||
| df1f94eb4a | |||
| e30078b544 | |||
| b48faae364 | |||
| c016dbdbda | |||
| 34307bb358 | |||
| 24b30d4d2f | |||
| 843c326ee1 | |||
| 08a5f57567 | |||
| 4be78c22c9 | |||
| a14d62b0b1 | |||
| b90a48f654 | |||
| fd8136fa75 | |||
| d29baf69bb | |||
| 68427c9beb | |||
| 1a674ce679 | |||
| f0d6e952c0 | |||
| a1c15ea855 | |||
| 1149243184 | |||
| 2c8957feea | |||
| dec759e7e8 | |||
| 27b1516d32 | |||
| 671569ddf7 | |||
| 89766b3d44 | |||
| bd21ed4099 | |||
| 5f789a687a | |||
| 558f8543ba | |||
| 519a677e87 | |||
| bbaa3effbd | |||
| ad3e560bc7 | |||
| ce01122a3b | |||
| 4a394cf53f | |||
| a767276fdd | |||
| e20faa6f03 | |||
| 8b32578119 | |||
| ce91bf9a34 | |||
| 70d5711848 | |||
| 33fb98338e | |||
| 999540dfe0 | |||
| 323f28dce2 | |||
| 7396095af7 | |||
| 9450bfcc6c | |||
| 9fc1951711 | |||
| 513fa30a63 | |||
| 63d91f449c | |||
| e823d8198a | |||
| b338596346 | |||
| c28bc80bbb | |||
| 4ab6a4a086 | |||
| dc540dd316 | |||
| d37f1fb8ba | |||
| 5b45422b58 | |||
| be236361f1 | |||
| 4469010c1b | |||
| ba71f1b57f | |||
| b8fad022a0 | |||
| 62bf536631 | |||
| 5f3bf65111 | |||
| ac12a5ae47 | |||
| 1251072f46 | |||
| 78b6a2ecbd | |||
| 1dc96a760d | |||
| 123cce6ffc | |||
| 88cd82e801 | |||
| e118db15d6 | |||
| 01b1466983 | |||
| 232822f36d | |||
| e5b8ffb848 | |||
| 25ceb81871 | |||
| 6200fd7bbc | |||
| e1dc5afd28 | |||
| 1e53faeb2e | |||
| 8ddbfe9752 | |||
| ebd48c6de5 | |||
| 42bfb83d74 | |||
| 41dad89f70 | |||
| 27c888db6c | |||
| 3f23634a17 | |||
| 9f3aa46f45 | |||
| 9799f4e150 | |||
| bfd8176636 | |||
| 919a964b8f | |||
| f5ed19f57d | |||
| 840fc8dbca | |||
| e248e9b042 | |||
| 1f60df81b2 | |||
| 0c3174c758 | |||
| 8560b55b5e | |||
| c99a2832ed | |||
| 1a9381c60d | |||
| 3e8761ab80 | |||
| 84b9579da7 | |||
| 1967c43eb9 | |||
| 3e04a41a9b | |||
| 2ac65551ea | |||
| 1b871e091b | |||
| 6b83090e80 | |||
| 95bab53868 | |||
| 62ccbe0960 | |||
| 05a2afc252 | |||
| c7ccb2e779 | |||
| 16d7b70b80 | |||
| fa4abdb3ea | |||
| 9f53f049c6 | |||
| f5a49bfa4d | |||
| 70f186f61e | |||
| ca2ef7dfcd | |||
| 7888914edd | |||
| d432a654f6 | |||
| 7af55d3a1c | |||
| f00bceab8d | |||
| 234cfefbb0 | |||
| e03544a138 | |||
| 0f502682fb | |||
| f9c16b02e3 | |||
| 3187228206 | |||
| 9e4ea25175 | |||
| e9d2a639f4 | |||
| 49155d2431 | |||
| 0270d44f57 | |||
| f875fb0e5f | |||
| 31560f6397 | |||
| 0106826a65 | |||
| a43d9352a9 | |||
| 53dc39d821 | |||
| 0bc2e54f00 | |||
| dbaf49203e | |||
| 3fefa292c1 | |||
| 3892d09f4f | |||
| 122c2f81b7 | |||
| fde4867f97 | |||
| 9eda0d156d | |||
| 7a3147e9b8 | |||
| d5ff69fce9 | |||
| 2024faf171 | |||
| 2c60ff2fe2 | |||
| 3d587c5343 | |||
| 7c6cd0ac28 | |||
| 82b62fa607 | |||
| bdf31d6e0a | |||
| 4334095c32 | |||
| 37c5759cbe | |||
| cde0c750af | |||
| 968ae57c60 | |||
| 84ad6af49a | |||
| f5af873617 | |||
| 47489a6974 | |||
| cd3166a8ed | |||
| d5b82bb70c | |||
| 5b317f7ea4 | |||
| b65c389769 | |||
| 7fb2a8b3d9 | |||
| 7604557e44 | |||
| f2002fea11 | |||
| 0ef61d392c | |||
| a5be95413f | |||
| cc36064960 | |||
| 5b6bd4e788 | |||
| 51ee20fc26 | |||
| 408b2d2bd0 | |||
| 61f6426269 | |||
| 8b240a0661 | |||
| 26b6ef79d6 | |||
| 58bf882579 | |||
| 11c043d27d | |||
| 85d69a7dd1 | |||
| 990de2c17c | |||
| d45fc7da3d | |||
| 3499728dc4 | |||
| 64743d0abe | |||
| 32634bce33 | |||
| 708ffff665 | |||
| e1bb2ebd92 | |||
| 6e4c8f683c | |||
| dca6796876 | |||
| 4a18337bae | |||
| 3c0c699ffd | |||
| 91758e399f | |||
| 239bd61b99 | |||
| 46dfe99e44 | |||
| 3e218523e8 | |||
| 9e15b511c3 | |||
| cb911e5bc1 | |||
| c8b07612a1 | |||
| 5a1b5e4b1d | |||
| 23ee06ed55 | |||
| de344815ed | |||
| d70919e6d5 | |||
| 026866df92 | |||
| be71ac3bcb | |||
| 61cf2ea9c0 | |||
| 5f34163b88 | |||
| 0f5488f79f | |||
| 57420b103e | |||
| 319beb64eb | |||
| 279ce5b705 | |||
| 0d309ce39a | |||
| 5be59a3649 | |||
| 5d390e9ee5 | |||
| 8f2c07d3cf | |||
| 77770ec798 | |||
| aea7c5b0c8 | |||
| dac7798144 | |||
| 013bdc6d65 | |||
| 9f58becc8d | |||
| 155b23008e | |||
| e7b16f33ae | |||
| 7d83655da9 | |||
| 36fc401621 | |||
| 7af7d7ce05 | |||
| f099249cf1 | |||
| 0ddadbf0a8 | |||
| 1b74af76b7 | |||
| d4e4efce68 | |||
| 46efc58024 | |||
| 3a9c0f23b4 | |||
| a6ea244f99 | |||
| 7079a99e76 | |||
| 7051b89267 | |||
| 12b4d66a80 | |||
| cc0a415e2f | |||
| 6c08840628 | |||
| 3a8de58c51 | |||
| 955fd4fea9 | |||
| de948350c2 | |||
| bcc3f7b656 | |||
| 707f7eb181 | |||
| 4213728067 | |||
| c4113721f8 | |||
| 90f980ed35 | |||
| 8bbb53e20b | |||
| 41436d3dfb | |||
| 44eb8bdeea | |||
| 9a9805fccf | |||
| e1d1c7c087 | |||
| 5f25855b3e | |||
| 269c3d1400 | |||
| 7db2a79b38 | |||
| b90096fe14 | |||
| bf6118e70c | |||
| 55695df0f7 | |||
| cf4aa3597f | |||
| 2a51b15518 | |||
| 63cc5bda60 | |||
| 7d84c3a488 | |||
| a1ea3adb28 | |||
| 3a8a8013ad | |||
| aa018a795d | |||
| a21ee1f990 | |||
| 83d3dc0f6f | |||
| 5e3b4a70d3 | |||
| 11c69b8045 | |||
| dc193c906d | |||
| 1c96500088 | |||
| 4e0410e927 | |||
| 367c2ef53b | |||
| e00bc7cd2f | |||
| 3ffd18a617 | |||
| 3ccc27019a | |||
| e0d31a8982 | |||
| 400c5a158b | |||
| 91df45516c | |||
| 044eff5bf0 | |||
| 067413fb73 | |||
| a8ec002926 | |||
| 469b80d4e7 | |||
| 493643fff8 | |||
| 38580455de | |||
| 66b01ce864 | |||
| e579f855fa | |||
| 0eabe49204 | |||
| a2ef9c5446 | |||
| 95f888fd6a | |||
| 678bb248d0 | |||
| 4a320f6c9a | |||
| 41c186d2a4 | |||
| f888e5c372 | |||
| 1988849bbf | |||
| 8632a60d33 | |||
| 6a3a197fcd | |||
| 62832c962f | |||
| ca257a06cc | |||
| 5b57075449 | |||
| 9e0fd78051 | |||
| 6dc41d9f8e | |||
| 7c7d2ec952 | |||
| 0ecdf6de03 | |||
| 50c746eeb7 | |||
| 93624bfee9 | |||
| a0c08aa36c | |||
| 27d4639779 | |||
| 75f6641eaf | |||
| 8e908c8c74 | |||
| 2608944dc2 | |||
| 8565d38f30 | |||
| d16bec9530 | |||
| ddd4d02f30 | |||
| b7d264be0d | |||
| a722c301bf | |||
| 1417978cd4 | |||
| 8d533e6ad6 | |||
| 78807d86eb | |||
| a2dec768a2 | |||
| 48fa42e5d5 | |||
| ea92136597 | |||
| 28d5700aae | |||
| 002a078aff | |||
| aeb2dac04d | |||
| 0af901e83f | |||
| 936b3fdeaa | |||
| 04976a32dc | |||
| fe379f856b | |||
| d8049331dc | |||
| 87d5057d86 | |||
| b518aaf193 | |||
| 300ee0c7b2 | |||
| afb07a79ab | |||
| 19b7acdd61 | |||
| ce32c69c0b | |||
| 0eb02871dd | |||
| da8beaaf76 | |||
| 1f9dcfc1ef | |||
| eae7a96b7d | |||
| e02ed0ee7e | |||
| af5c6ae5ed | |||
| bec2e3f55c | |||
| 4d5b4c7863 | |||
| e59041684e | |||
| 88dbbfb2d6 | |||
| cec1c63642 | |||
| 5c5937182a | |||
| 421929b556 | |||
| b5bab710f7 | |||
| 89da1bfeac | |||
| 95f933ea85 | |||
| c783e14887 | |||
| e86c02ea90 | |||
| 3fbb55c757 | |||
| 7bd16b8776 | |||
| 054b6013c2 | |||
| 9f318be3d3 | |||
| 801ec115cf | |||
| c1e47bf4fe | |||
| fc3551a6d7 | |||
| 3081d3868e | |||
| 51e5eca612 | |||
| 3ab0185b06 | |||
| 5c14fceac0 | |||
| 149c833b75 | |||
| f1c22dae7d | |||
| d2904264ab | |||
| 65ee1a43e5 | |||
| 9d60eebeb5 | |||
| a2045067c5 | |||
| 68b0baeedc | |||
| 07c2607d4d | |||
| 010965dcde | |||
| a57d784df5 | |||
| 72ec2f3eb5 | |||
| 26d9212e3c | |||
| d7b3b709d0 | |||
| c63fcabfe9 | |||
| 09549aa18c | |||
| aacd2123ee | |||
| db514a75d0 | |||
| e59d4d0147 | |||
| 3dd538c4d3 | |||
| b8385d8a11 | |||
| 008c2d0b7a | |||
| 1c191efc3a | |||
| c37573806a | |||
| e1f6e4903a | |||
| 41cd52a768 | |||
| 330d83fdbd | |||
| 2a15e8ccfb | |||
| 707105290b | |||
| 361b6df36a | |||
| 99029ab6b0 | |||
| 18447c206d | |||
| cd66539662 | |||
| c164c651dc | |||
| f667d5b260 | |||
| f5d3bb1dd2 | |||
| 4be082ce39 | |||
| 2146833767 | |||
| 63b90a51aa | |||
| 5c7789d416 | |||
| 79815090ea | |||
| 92d4ef9ab0 | |||
| 75858ca156 | |||
| f8363e49f9 | |||
| 5642a555ae | |||
| 2dd975b235 | |||
| c8be8a9adb | |||
| cf4eb8b3f9 | |||
| 607611f240 | |||
| 6b29bff852 | |||
| 26700a9516 | |||
| 73ad258806 | |||
| 76c4d8bf26 | |||
| 9bd5d97cdd | |||
| efa4f5f0ea | |||
| 596bb85f2f | |||
| b91e65afe0 | |||
| c9184a2e03 | |||
| e92140c567 | |||
| 4114c9a75b | |||
| 872e6be03d | |||
| 0a22335e66 | |||
| c1c2d68d37 | |||
| a105c9b776 | |||
| 4475f1dc2a | |||
| ecd5397106 | |||
| 33b7c9a8aa | |||
| 2406892a2e | |||
| 6b3532643f | |||
| 4b7988eb49 | |||
| c4d78f01de | |||
| c1b20e42f5 | |||
| 85cb447766 | |||
| 4766e009b0 | |||
| ba1b3db709 | |||
| 7a26307e31 | |||
| 0b8c84e110 | |||
| 9396b40433 | |||
| 53ee995ac9 | |||
| 5adf5cab2f | |||
| 5d1a3d135c | |||
| 58e999b7e6 | |||
| d07c771dd9 | |||
| 680733a7c4 | |||
| 73a0381282 | |||
| b9c6a97694 | |||
| 02039352b2 | |||
| d160782a53 | |||
| 8e20887886 | |||
| c02cd95c56 | |||
| e53af030c0 | |||
| 20677b22fe | |||
| 5ee67a4412 | |||
| d12bbe4942 | |||
| 642e1936e3 | |||
| c76de1053e | |||
| 702f4a49cd | |||
| aa08a34669 | |||
| 854260ca44 | |||
| 74b3344fbc | |||
| ef8d6f2b4a | |||
| 180c6de6a6 | |||
| 066fd047cc | |||
| 4d10474fa5 | |||
| 3efcfeab67 | |||
| 286ccefb48 | |||
| 41c559415a | |||
| 11fbc32e3e | |||
| 062300ba7f | |||
| 8b2de0e483 | |||
| 42f359d015 | |||
| 35236b870e | |||
| 4ebe798ff2 | |||
| c4ecd234f2 | |||
| ffecfea949 | |||
| 98e409abb3 | |||
| ee5b24573b | |||
| 0305673098 | |||
| ce6add8ecc | |||
| 139e830158 | |||
| 6f3c99acca | |||
| f4f4e6b2d3 | |||
| d50649531f | |||
| 774760e6f3 | |||
| 01977466f4 | |||
| ef83dc4f0c | |||
| 7828194ebe | |||
| b6ddb08a66 | |||
| 439e7abd2d | |||
| 8be921f9de | |||
| a75db353c4 | |||
| 4362ee298a | |||
| 4046e66e40 | |||
| b6f332ecaf | |||
| 2bef3433e5 | |||
| 8aa67fc192 | |||
| b89a964d3f | |||
| cc27ac1a87 | |||
| a3f96f366a | |||
| 319d840b46 | |||
| 45a8eb66bb | |||
| a6e36558ef | |||
| 0759f2510c | |||
| 14e52783f6 | |||
| 662b143b71 | |||
| 59c378d069 | |||
| 0ebda5382b | |||
| 879fe8fa75 | |||
| 55fb88d369 | |||
| 4fa1cd995c | |||
| 6b586ed18c | |||
| 401377e679 | |||
| 40d60e1536 | |||
| 83bfdbdd75 | |||
| 72eefb34a9 | |||
| 5af8df5afb | |||
| 68b6907290 | |||
| 3bbe68f837 | |||
| 3bb4466260 | |||
| 225de5ccbb | |||
| 46554fc12f | |||
| 0e4f727069 | |||
| b1198a8440 | |||
| 0245cee469 | |||
| 0512bfe79e | |||
| cf57447648 | |||
| 5c6eca71a9 | |||
| 39db2f3c19 | |||
| 2772d3e79d | |||
| f1bb6f0839 | |||
| 0b54046ff8 | |||
| 2e20c0f34a | |||
| 7223844df9 | |||
| b13c6c18d0 | |||
| f689743e74 | |||
| 8679bd7144 | |||
| 588e6caa15 | |||
| 143738214c | |||
| 91ff480e26 | |||
| 1fec32adc6 | |||
| ecfa7eb260 | |||
| 439a43b6b4 | |||
| 6626d8a62f | |||
| 14e9d2954c | |||
| e2f07c01e9 | |||
| 73caccde3f | |||
| c066598c23 | |||
| 62ba3b6b43 | |||
| 3c6d73bc5c | |||
| 7d2feb3a3b | |||
| a13c8145bc | |||
| 86a154722f | |||
| d58926ab1d | |||
| a04d4bf2d7 | |||
| d8fb278a2c | |||
| b0a917c48a | |||
| bda1cb0236 | |||
| e46ad22cd6 | |||
| b9962b8656 | |||
| f5cd27694a | |||
| 9a498c37a2 | |||
| 6900dded49 | |||
| 773d386041 | |||
| f176fbf588 | |||
| be323d5152 | |||
| ea8ffe36d3 | |||
| d329b63369 | |||
| c4e1586db8 | |||
| 53b38d6269 | |||
| 3f52c685c1 | |||
| c89180a9de | |||
| c71f73f438 | |||
| 83424ade1a | |||
| bfc885091b | |||
| 29dada00c4 | |||
| 95e2e14f9d | |||
| 477480ce2a | |||
| 0dad5d825d | |||
| 4dd857244c | |||
| bd5593b6c4 | |||
| 9e9b8f1d99 | |||
| 2e0d767ab2 | |||
| 0454e4bd8b | |||
| 3157fa3c53 | |||
| ab7551cd7f | |||
| 76cadb7943 | |||
| a8bf2fa76e | |||
| 5008e08885 | |||
| 6f5ab9daf1 | |||
| 13a9c9a354 | |||
| 3ff2cde5ca | |||
| 24cbf6bc5a | |||
| 7390d9de63 | |||
| 7fcee113c1 | |||
| 1bf38611a4 | |||
| dc420b0eb1 | |||
| ee11224611 | |||
| 9870093f7b | |||
| 2e4082364e | |||
| 60e448c87e | |||
| 33929448a1 | |||
| a6d62aaba0 | |||
| 8aa01d2a6d | |||
| 83e5a10603 | |||
| 0dd1152c18 | |||
| f82653874b | |||
| fbf468b057 | |||
| a317e6c3be | |||
| da9754a3a0 | |||
| 07df5578d9 | |||
| 3f44a66cb6 | |||
| d4c834d2e0 | |||
| a28da4c490 | |||
| f064e0a43d | |||
| b7439675b8 | |||
| 790f1c9545 | |||
| 75b8990d90 | |||
| c1a65385a1 | |||
| b5995badc9 | |||
| a4340d3b85 | |||
| 3d4b3bc3fd | |||
| 23d6761f30 | |||
| 8ff619d95e | |||
| fe6ff4a920 | |||
| f84226b7a1 | |||
| 5c673efad7 | |||
| fd0255b41d | |||
| e2d22eef14 | |||
| 640421c0ec | |||
| 9160d81c98 | |||
| 0d00c08da0 | |||
| c3287ebd31 | |||
| df55c2b9b1 | |||
| c164064eef | |||
| 1da782cb28 | |||
| bf78f523aa | |||
| 63f2b9ab33 | |||
| 3ec851dc5e | |||
| fd85734e0e | |||
| 1486fb8108 | |||
| f3d0866ed9 | |||
| 68a441fa4c | |||
| d3c3e722d6 | |||
| 12e02e339f | |||
| ba15fe7995 | |||
| b3f95dceca | |||
| a492aec82d | |||
| a3bd763732 | |||
| 569f61a760 | |||
| 4f19881f88 | |||
| 303989de0e | |||
| 5f43623843 | |||
| 7c300d6d42 | |||
| 0c1c42c120 | |||
| 9ff672fc4d | |||
| 434022adac | |||
| f6e254474c | |||
| 98364ea74f | |||
| e218249b02 | |||
| 795c1444e9 | |||
| 40de2d5a4f |
@ -65,7 +65,7 @@ jobs:
|
||||
run_tests_torch_and_tf:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
RUN_PT_TF_CROSS_TESTS: yes
|
||||
@ -80,8 +80,10 @@ jobs:
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
|
||||
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install tensorflow_probability
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -97,6 +99,39 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_torch_and_tf_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
RUN_PT_TF_CROSS_TESTS: yes
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch_and_tf-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install tensorflow_probability
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_tf tests -m is_pt_tf_cross_test --durations=0 | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_torch_and_flax:
|
||||
working_directory: ~/transformers
|
||||
@ -116,8 +151,9 @@ jobs:
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
|
||||
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -133,6 +169,38 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_torch_and_flax_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
RUN_PT_FLAX_CROSS_TESTS: yes
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch_and_flax-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_torch_and_flax tests -m is_pt_flax_cross_test --durations=0 | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_torch:
|
||||
working_directory: ~/transformers
|
||||
@ -151,8 +219,9 @@ jobs:
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,speech,vision,timm]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-torch-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -168,6 +237,37 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_torch_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-torch-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 3 --dist=loadfile -s --make-reports=tests_torch tests | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_tf:
|
||||
working_directory: ~/transformers
|
||||
@ -184,8 +284,11 @@ jobs:
|
||||
keys:
|
||||
- v0.4-tf-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece]
|
||||
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]
|
||||
- run: pip install tensorflow_probability
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-tf-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -201,6 +304,37 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_tf_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-tf-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]
|
||||
- run: pip install tensorflow_probability
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-tf-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_tf tests | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_flax:
|
||||
working_directory: ~/transformers
|
||||
@ -217,8 +351,10 @@ jobs:
|
||||
keys:
|
||||
- v0.4-flax-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: sudo pip install .[flax,testing,sentencepiece]
|
||||
- run: pip install .[flax,testing,sentencepiece,flax-speech,vision]
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-flax-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -234,6 +370,36 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_flax_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-flax-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[flax,testing,sentencepiece,vision,flax-speech]
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-flax-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_flax tests | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_pipelines_torch:
|
||||
working_directory: ~/transformers
|
||||
@ -253,8 +419,9 @@ jobs:
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,speech,vision]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cpu.html
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-torch-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -270,6 +437,38 @@ jobs:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_pipelines_torch_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
RUN_PIPELINE_TESTS: yes
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
|
||||
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
|
||||
- run: pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- save_cache:
|
||||
key: v0.4-torch-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_torch -m is_pipeline_test tests | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_pipelines_tf:
|
||||
working_directory: ~/transformers
|
||||
@ -289,6 +488,7 @@ jobs:
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece]
|
||||
- run: pip install tensorflow_probability
|
||||
- save_cache:
|
||||
key: v0.4-tf-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
@ -305,6 +505,36 @@ jobs:
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_pipelines_tf_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
RUN_PIPELINE_TESTS: yes
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-tf-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece]
|
||||
- run: pip install tensorflow_probability
|
||||
- save_cache:
|
||||
key: v0.4-tf-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -n 8 --dist=loadfile -rA -s --make-reports=tests_pipelines_tf tests -m is_pipeline_test | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_custom_tokenizers:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
@ -349,25 +579,119 @@ jobs:
|
||||
keys:
|
||||
- v0.4-torch_examples-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,sentencepiece,testing]
|
||||
- run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech]
|
||||
- run: pip install -r examples/pytorch/_tests_requirements.txt
|
||||
- save_cache:
|
||||
key: v0.4-torch_examples-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: python utils/tests_fetcher.py | tee test_preparation.txt
|
||||
- run: python utils/tests_fetcher.py --filters examples tests | tee test_preparation.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/test_preparation.txt
|
||||
- run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee examples_output.txt
|
||||
python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee tests_output.txt
|
||||
fi
|
||||
- store_artifacts:
|
||||
path: ~/transformers/examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_examples_torch_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch_examples-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech]
|
||||
- run: pip install -r examples/pytorch/_tests_requirements.txt
|
||||
- save_cache:
|
||||
key: v0.4-torch_examples-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_torch ./examples/pytorch/ | tee examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_examples_flax:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-flax_examples-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: sudo pip install .[flax,testing,sentencepiece]
|
||||
- run: pip install -r examples/flax/_tests_requirements.txt
|
||||
- save_cache:
|
||||
key: v0.4-flax_examples-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: python utils/tests_fetcher.py --filters examples tests | tee test_preparation.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/test_preparation.txt
|
||||
- run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_flax ./examples/flax/ | tee tests_output.txt
|
||||
fi
|
||||
- store_artifacts:
|
||||
path: ~/transformers/flax_examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_examples_flax_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-flax_examples-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: sudo pip install .[flax,testing,sentencepiece]
|
||||
- run: pip install -r examples/flax/_tests_requirements.txt
|
||||
- save_cache:
|
||||
key: v0.4-flax_examples-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
TRANSFORMERS_IS_CI=1 python -m pytest -n 8 --dist=loadfile -s --make-reports=examples_flax ./examples/flax/ | tee examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/flax_examples_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_hub:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
@ -399,8 +723,45 @@ jobs:
|
||||
path: ~/transformers/test_preparation.txt
|
||||
- run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -sv $(cat test_list.txt) -m is_staging_test
|
||||
python -m pytest -sv --make-reports=tests_hub $(cat test_list.txt) -m is_staging_test | tee tests_output.txt
|
||||
fi
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_hub_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
HUGGINGFACE_CO_STAGING: yes
|
||||
RUN_GIT_LFS_TESTS: yes
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-hub-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get install git-lfs
|
||||
- run: |
|
||||
git config --global user.email "ci@dummy.com"
|
||||
git config --global user.name "ci"
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[torch,sentencepiece,testing]
|
||||
- save_cache:
|
||||
key: v0.4-hub-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: |
|
||||
python -m pytest -sv --make-reports=tests_hub tests -m is_staging_test | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
run_tests_onnxruntime:
|
||||
working_directory: ~/transformers
|
||||
@ -428,56 +789,40 @@ jobs:
|
||||
path: ~/transformers/test_preparation.txt
|
||||
- run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_torch $(cat test_list.txt) -k onnx | tee tests_output.txt
|
||||
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_onnx $(cat test_list.txt) -k onnx | tee tests_output.txt
|
||||
fi
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
build_doc:
|
||||
|
||||
run_tests_onnxruntime_all:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
resource_class: large
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-build_doc-{{ checksum "setup.py" }}
|
||||
- v0.4-torch-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install ."[docs]"
|
||||
- run: pip install .[torch,testing,sentencepiece,onnxruntime]
|
||||
- save_cache:
|
||||
key: v0.4-build_doc-{{ checksum "setup.py" }}
|
||||
key: v0.4-onnx-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: cd docs && make html SPHINXOPTS="-W -j 4"
|
||||
- run: |
|
||||
python -m pytest -n 1 --dist=loadfile -s --make-reports=tests_onnx tests -k onnx | tee tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ./docs/_build
|
||||
|
||||
deploy_doc:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
resource_class: large
|
||||
steps:
|
||||
- add_ssh_keys:
|
||||
fingerprints:
|
||||
- "5b:7a:95:18:07:8c:aa:76:4c:60:35:88:ad:60:56:71"
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-deploy_doc-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install ."[docs]"
|
||||
- save_cache:
|
||||
key: v0.4-deploy_doc-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: ./.circleci/deploy.sh
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
check_code_quality:
|
||||
working_directory: ~/transformers
|
||||
@ -494,7 +839,6 @@ jobs:
|
||||
- v0.4-code_quality-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install isort GitPython
|
||||
- run: pip install .[all,quality]
|
||||
- save_cache:
|
||||
key: v0.4-code_quality-{{ checksum "setup.py" }}
|
||||
@ -505,6 +849,27 @@ jobs:
|
||||
- run: python utils/custom_init_isort.py --check_only
|
||||
- run: flake8 examples tests src utils
|
||||
- run: python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
|
||||
|
||||
check_repository_consistency:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
resource_class: large
|
||||
environment:
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-repository_consistency-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[all,quality]
|
||||
- save_cache:
|
||||
key: v0.4-repository_consistency-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: python utils/check_copies.py
|
||||
- run: python utils/check_table.py
|
||||
- run: python utils/check_dummies.py
|
||||
@ -513,16 +878,43 @@ jobs:
|
||||
- run: make deps_table_check_updated
|
||||
- run: python utils/tests_fetcher.py --sanity_check
|
||||
|
||||
check_repository_consistency:
|
||||
run_tests_layoutlmv2:
|
||||
working_directory: ~/transformers
|
||||
docker:
|
||||
- image: circleci/python:3.6
|
||||
resource_class: small
|
||||
- image: circleci/python:3.7
|
||||
environment:
|
||||
OMP_NUM_THREADS: 1
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
resource_class: xlarge
|
||||
parallelism: 1
|
||||
steps:
|
||||
- checkout
|
||||
- run: pip install requests
|
||||
- run: python ./utils/link_tester.py
|
||||
- restore_cache:
|
||||
keys:
|
||||
- v0.4-torch-{{ checksum "setup.py" }}
|
||||
- v0.4-{{ checksum "setup.py" }}
|
||||
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
- run: pip install --upgrade pip
|
||||
- run: pip install .[torch,testing,vision]
|
||||
- run: pip install torchvision
|
||||
- run: python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
|
||||
- run: sudo apt install tesseract-ocr
|
||||
- run: pip install pytesseract
|
||||
- save_cache:
|
||||
key: v0.4-torch-{{ checksum "setup.py" }}
|
||||
paths:
|
||||
- '~/.cache/pip'
|
||||
- run: python utils/tests_fetcher.py | tee test_preparation.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/test_preparation.txt
|
||||
- run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 1 tests/*layoutlmv2* --dist=loadfile -s --make-reports=tests_layoutlmv2 --durations=100
|
||||
fi
|
||||
- store_artifacts:
|
||||
path: ~/transformers/tests_output.txt
|
||||
- store_artifacts:
|
||||
path: ~/transformers/reports
|
||||
|
||||
# TPU JOBS
|
||||
run_examples_tpu:
|
||||
@ -567,6 +959,7 @@ workflows:
|
||||
- check_code_quality
|
||||
- check_repository_consistency
|
||||
- run_examples_torch
|
||||
- run_examples_flax
|
||||
- run_tests_custom_tokenizers
|
||||
- run_tests_torch_and_tf
|
||||
- run_tests_torch_and_flax
|
||||
@ -577,8 +970,28 @@ workflows:
|
||||
- run_tests_pipelines_tf
|
||||
- run_tests_onnxruntime
|
||||
- run_tests_hub
|
||||
- build_doc
|
||||
- deploy_doc: *workflow_filters
|
||||
- run_tests_layoutlmv2
|
||||
nightly:
|
||||
triggers:
|
||||
- schedule:
|
||||
cron: "0 0 * * *"
|
||||
filters:
|
||||
branches:
|
||||
only:
|
||||
- master
|
||||
jobs:
|
||||
- run_examples_torch_all
|
||||
- run_examples_flax_all
|
||||
- run_tests_torch_and_tf_all
|
||||
- run_tests_torch_and_flax_all
|
||||
- run_tests_torch_all
|
||||
- run_tests_tf_all
|
||||
- run_tests_flax_all
|
||||
- run_tests_pipelines_torch_all
|
||||
- run_tests_pipelines_tf_all
|
||||
- run_tests_onnxruntime_all
|
||||
- run_tests_hub_all
|
||||
|
||||
# tpu_testing_jobs:
|
||||
# triggers:
|
||||
# - schedule:
|
||||
|
||||
@ -1,71 +0,0 @@
|
||||
cd docs
|
||||
|
||||
function deploy_doc(){
|
||||
echo "Creating doc at commit $1 and pushing to folder $2"
|
||||
git checkout $1
|
||||
pip install -U ..
|
||||
if [ ! -z "$2" ]
|
||||
then
|
||||
if [ "$2" == "master" ]; then
|
||||
echo "Pushing master"
|
||||
make clean && make html && scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir/$2/
|
||||
cp -r _build/html/_static .
|
||||
elif ssh -oStrictHostKeyChecking=no $doc "[ -d $dir/$2 ]"; then
|
||||
echo "Directory" $2 "already exists"
|
||||
scp -r -oStrictHostKeyChecking=no _static/* $doc:$dir/$2/_static/
|
||||
else
|
||||
echo "Pushing version" $2
|
||||
make clean && make html
|
||||
rm -rf _build/html/_static
|
||||
cp -r _static _build/html
|
||||
scp -r -oStrictHostKeyChecking=no _build/html $doc:$dir/$2
|
||||
fi
|
||||
else
|
||||
echo "Pushing stable"
|
||||
make clean && make html
|
||||
rm -rf _build/html/_static
|
||||
cp -r _static _build/html
|
||||
scp -r -oStrictHostKeyChecking=no _build/html/* $doc:$dir
|
||||
fi
|
||||
}
|
||||
|
||||
# You can find the commit for each tag on https://github.com/huggingface/transformers/tags
|
||||
deploy_doc "master" master
|
||||
deploy_doc "b33a385" v1.0.0
|
||||
deploy_doc "fe02e45" v1.1.0
|
||||
deploy_doc "89fd345" v1.2.0
|
||||
deploy_doc "fc9faa8" v2.0.0
|
||||
deploy_doc "3ddce1d" v2.1.1
|
||||
deploy_doc "3616209" v2.2.0
|
||||
deploy_doc "d0f8b9a" v2.3.0
|
||||
deploy_doc "6664ea9" v2.4.0
|
||||
deploy_doc "fb560dc" v2.5.0
|
||||
deploy_doc "b90745c" v2.5.1
|
||||
deploy_doc "fbc5bf1" v2.6.0
|
||||
deploy_doc "6f5a12a" v2.7.0
|
||||
deploy_doc "11c3257" v2.8.0
|
||||
deploy_doc "e7cfc1a" v2.9.0
|
||||
deploy_doc "7cb203f" v2.9.1
|
||||
deploy_doc "10d7239" v2.10.0
|
||||
deploy_doc "b42586e" v2.11.0
|
||||
deploy_doc "7fb8bdf" v3.0.2
|
||||
deploy_doc "4b3ee9c" v3.1.0
|
||||
deploy_doc "3ebb1b3" v3.2.0
|
||||
deploy_doc "0613f05" v3.3.1
|
||||
deploy_doc "eb0e0ce" v3.4.0
|
||||
deploy_doc "818878d" v3.5.1
|
||||
deploy_doc "c781171" v4.0.1
|
||||
deploy_doc "bfa4ccf" v4.1.1
|
||||
deploy_doc "7d9a9d0" v4.2.2
|
||||
deploy_doc "bae0c79" v4.3.3
|
||||
deploy_doc "c988db5" v4.4.0
|
||||
deploy_doc "c5d6a28" v4.4.1
|
||||
deploy_doc "6bc89ed" v4.4.2
|
||||
deploy_doc "4906a29" v4.5.0
|
||||
deploy_doc "4bae96e" v4.5.1
|
||||
deploy_doc "25dee4a" v4.6.0
|
||||
deploy_doc "7a6c9fa" v4.7.0
|
||||
deploy_doc "9252a51" v4.8.0
|
||||
deploy_doc "1366172" v4.8.1
|
||||
deploy_doc "96d1cfb" v4.8.2
|
||||
deploy_doc "72aee83" # v4.9.0 Latest stable release
|
||||
44
.github/ISSUE_TEMPLATE/bug-report.md
vendored
44
.github/ISSUE_TEMPLATE/bug-report.md
vendored
@ -27,30 +27,39 @@ assignees: ''
|
||||
|
||||
Models:
|
||||
|
||||
- albert, bert, xlm: @LysandreJik
|
||||
- blenderbot, bart, marian, pegasus, encoderdecoder, t5: @patrickvonplaten, @patil-suraj
|
||||
- longformer, reformer, transfoxl, xlnet: @patrickvonplaten
|
||||
- fsmt: @stas00
|
||||
- funnel: @sgugger
|
||||
- gpt2: @patrickvonplaten, @LysandreJik
|
||||
- rag: @patrickvonplaten, @lhoestq
|
||||
- tensorflow: @Rocketknight1
|
||||
- ALBERT, BERT, XLM, DeBERTa, DeBERTa-v2, ELECTRA, MobileBert, SqueezeBert: @LysandreJik
|
||||
- T5, BART, Marian, Pegasus, EncoderDecoder: @patrickvonplaten
|
||||
- Blenderbot, MBART: @patil-suraj
|
||||
- Longformer, Reformer, TransfoXL, XLNet, FNet, BigBird: @patrickvonplaten
|
||||
- FSMT: @stas00
|
||||
- Funnel: @sgugger
|
||||
- GPT-2, GPT: @patrickvonplaten, @LysandreJik
|
||||
- RAG, DPR: @patrickvonplaten, @lhoestq
|
||||
- TensorFlow: @Rocketknight1
|
||||
- JAX/Flax: @patil-suraj
|
||||
- TAPAS, LayoutLM, LayoutLMv2, LUKE, ViT, BEiT, DEiT, DETR, CANINE: @NielsRogge
|
||||
- GPT-Neo, GPT-J, CLIP: @patil-suraj
|
||||
- Wav2Vec2, HuBERT, SpeechEncoderDecoder, UniSpeech, UniSpeechSAT, SEW, SEW-D, Speech2Text: @patrickvonplaten, @anton-l
|
||||
|
||||
If the model isn't in the list, ping @LysandreJik who will redirect you to the correct contributor.
|
||||
|
||||
Library:
|
||||
|
||||
- benchmarks: @patrickvonplaten
|
||||
- deepspeed: @stas00
|
||||
- ray/raytune: @richardliaw, @amogkam
|
||||
- text generation: @patrickvonplaten
|
||||
- tokenizers: @LysandreJik
|
||||
- trainer: @sgugger
|
||||
- pipelines: @LysandreJik
|
||||
- Benchmarks: @patrickvonplaten
|
||||
- Deepspeed: @stas00
|
||||
- Ray/raytune: @richardliaw, @amogkam
|
||||
- Text generation: @patrickvonplaten @narsil
|
||||
- Tokenizers: @LysandreJik
|
||||
- Trainer: @sgugger
|
||||
- Pipelines: @Narsil
|
||||
- Speech: @patrickvonplaten, @anton-l
|
||||
- Vision: @NielsRogge, @sgugger
|
||||
|
||||
Documentation: @sgugger
|
||||
|
||||
Model hub:
|
||||
|
||||
- for issues with a model report at https://discuss.huggingface.co/ and tag the model's creator.
|
||||
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
|
||||
|
||||
HF projects:
|
||||
|
||||
@ -60,6 +69,9 @@ HF projects:
|
||||
Examples:
|
||||
|
||||
- maintained examples (not research project or legacy): @sgugger, @patil-suraj
|
||||
|
||||
For research projetcs, please ping the contributor directly. For example, on the following projects:
|
||||
|
||||
- research_projects/bert-loses-patience: @JetRunner
|
||||
- research_projects/distillation: @VictorSanh
|
||||
|
||||
|
||||
50
.github/workflows/build_doc_test.yml
vendored
Normal file
50
.github/workflows/build_doc_test.yml
vendored
Normal file
@ -0,0 +1,50 @@
|
||||
name: Documentation test build
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
paths:
|
||||
- "src/**"
|
||||
- "docs/**"
|
||||
- ".github/**"
|
||||
|
||||
jobs:
|
||||
build_and_package:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash -l {0}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
|
||||
- name: Loading cache.
|
||||
uses: actions/cache@v2
|
||||
id: cache
|
||||
with:
|
||||
path: ~/.cache/pip
|
||||
key: v1-test_build_doc
|
||||
restore-keys: |
|
||||
v1-test_build_doc-${{ hashFiles('setup.py') }}
|
||||
v1-test_build_doc
|
||||
|
||||
- name: Setup environment
|
||||
run: |
|
||||
pip install --upgrade pip
|
||||
sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
|
||||
pip install git+https://github.com/huggingface/doc-builder
|
||||
pip install .[dev]
|
||||
|
||||
export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])")
|
||||
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH_VERSION}+cpu.html
|
||||
|
||||
pip install torchvision
|
||||
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
|
||||
|
||||
sudo apt install tesseract-ocr
|
||||
pip install pytesseract
|
||||
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
|
||||
|
||||
- name: Make documentation
|
||||
run: |
|
||||
doc-builder build transformers ./docs/source
|
||||
99
.github/workflows/build_documentation.yml
vendored
Normal file
99
.github/workflows/build_documentation.yml
vendored
Normal file
@ -0,0 +1,99 @@
|
||||
name: Build documentation
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
- doc-builder*
|
||||
|
||||
jobs:
|
||||
build_and_package:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash -l {0}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
repository: 'huggingface/doc-builder'
|
||||
path: doc-builder
|
||||
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
||||
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
repository: 'huggingface/transformers'
|
||||
path: transformers
|
||||
|
||||
- uses: actions/checkout@v2
|
||||
with:
|
||||
repository: 'huggingface/notebooks'
|
||||
path: notebooks
|
||||
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
||||
|
||||
- name: Loading cache.
|
||||
uses: actions/cache@v2
|
||||
id: cache
|
||||
with:
|
||||
path: ~/.cache/pip
|
||||
key: v1-test_build_doc
|
||||
restore-keys: |
|
||||
v1-test_build_doc-${{ hashFiles('setup.py') }}
|
||||
v1-test_build_doc
|
||||
|
||||
- name: Setup environment
|
||||
run: |
|
||||
sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
|
||||
|
||||
pip install git+https://github.com/huggingface/doc-builder
|
||||
pip install git+https://github.com/huggingface/transformers#egg=transformers[dev]
|
||||
|
||||
export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])")
|
||||
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH_VERSION}+cpu.html
|
||||
|
||||
pip install torchvision
|
||||
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
|
||||
|
||||
sudo apt install tesseract-ocr
|
||||
pip install pytesseract
|
||||
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Setup git
|
||||
run: |
|
||||
git config --global user.name "Hugging Face"
|
||||
git config --global user.email transformers@huggingface.co
|
||||
|
||||
cd doc-builder
|
||||
git pull origin main
|
||||
cd ..
|
||||
|
||||
cd notebooks
|
||||
git pull origin master
|
||||
cd ..
|
||||
|
||||
- name: Make documentation
|
||||
run: |
|
||||
doc-builder build transformers transformers/docs/source --build_dir doc-builder/build --notebook_dir notebooks/transformers_doc --clean
|
||||
|
||||
- name: Push to repositories
|
||||
run: |
|
||||
cd doc-builder
|
||||
if [[ `git status --porcelain` ]]; then
|
||||
git add build
|
||||
git commit -m "Updated with commit ${{ github.sha }}"
|
||||
git push origin main
|
||||
else
|
||||
echo "No diff in the documentation."
|
||||
fi
|
||||
cd ..
|
||||
|
||||
cd notebooks
|
||||
if [[ `git status --porcelain` ]]; then
|
||||
git add transformers_doc
|
||||
git commit -m "Updated Transformer doc notebooks with commit ${{ github.sha }}"
|
||||
git push origin master
|
||||
else
|
||||
echo "No diff in the notebooks."
|
||||
fi
|
||||
cd ..
|
||||
42
.github/workflows/doctests.yml
vendored
Normal file
42
.github/workflows/doctests.yml
vendored
Normal file
@ -0,0 +1,42 @@
|
||||
name: Doctests
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- doctest*
|
||||
repository_dispatch:
|
||||
schedule:
|
||||
- cron: "0 0 * * *"
|
||||
|
||||
|
||||
env:
|
||||
HF_HOME: /mnt/cache
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
RUN_SLOW: yes
|
||||
OMP_NUM_THREADS: 16
|
||||
MKL_NUM_THREADS: 16
|
||||
PYTEST_TIMEOUT: 600
|
||||
|
||||
jobs:
|
||||
run_doctests:
|
||||
runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
container:
|
||||
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev
|
||||
pip install --upgrade pip
|
||||
pip install .[dev]
|
||||
|
||||
- name: Run doctests
|
||||
run: |
|
||||
pytest --doctest-modules $(cat utils/documentation_tests.txt) -sv --doctest-continue-on-failure
|
||||
6
.github/workflows/model-templates.yml
vendored
6
.github/workflows/model-templates.yml
vendored
@ -36,7 +36,7 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install --upgrade pip
|
||||
pip install --upgrade pip!=21.3
|
||||
sudo apt -y update && sudo apt install -y libsndfile1-dev
|
||||
pip install .[dev]
|
||||
- name: Create model files
|
||||
@ -47,6 +47,8 @@ jobs:
|
||||
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
|
||||
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
|
||||
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
|
||||
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
|
||||
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
|
||||
make style
|
||||
python utils/check_table.py --fix_and_overwrite
|
||||
python utils/check_dummies.py --fix_and_overwrite
|
||||
@ -59,7 +61,7 @@ jobs:
|
||||
- name: Run style changes
|
||||
run: |
|
||||
git fetch origin master:master
|
||||
make fixup
|
||||
make style && make quality
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
|
||||
246
.github/workflows/self-nightly-scheduled.yml
vendored
Normal file
246
.github/workflows/self-nightly-scheduled.yml
vendored
Normal file
@ -0,0 +1,246 @@
|
||||
name: Self-hosted runner; Nightly (scheduled)
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- nightly_ci*
|
||||
repository_dispatch:
|
||||
schedule:
|
||||
- cron: "0 0 */3 * *"
|
||||
|
||||
env:
|
||||
HF_HOME: /mnt/cache
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
RUN_SLOW: yes
|
||||
OMP_NUM_THREADS: 16
|
||||
MKL_NUM_THREADS: 16
|
||||
PYTEST_TIMEOUT: 600
|
||||
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
|
||||
|
||||
jobs:
|
||||
run_all_tests_torch_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
container:
|
||||
image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_gpu tests
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_gpu_failures_short.txt
|
||||
|
||||
- name: Run examples tests on GPU
|
||||
if: ${{ always() }}
|
||||
env:
|
||||
OMP_NUM_THREADS: 16
|
||||
MKL_NUM_THREADS: 16
|
||||
RUN_SLOW: yes
|
||||
HF_HOME: /mnt/cache
|
||||
TRANSFORMERS_IS_CI: yes
|
||||
run: |
|
||||
pip install -r examples/pytorch/_tests_requirements.txt
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=examples_torch_gpu examples
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/examples_torch_gpu_failures_short.txt
|
||||
|
||||
- name: Run all pipeline tests on GPU
|
||||
if: ${{ always() }}
|
||||
env:
|
||||
RUN_PIPELINE_TESTS: yes
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_pipeline_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_all_tests_torch_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_tests_torch_multi_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
container:
|
||||
image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
|
||||
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
env:
|
||||
MKL_SERVICE_FORCE_INTEL: 1
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_multi_gpu tests
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_multi_gpu_failures_short.txt
|
||||
|
||||
- name: Run all pipeline tests on GPU
|
||||
if: ${{ always() }}
|
||||
env:
|
||||
RUN_PIPELINE_TESTS: yes
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_pipeline_multi_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_all_tests_torch_multi_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_tests_torch_cuda_extensions_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
container:
|
||||
image: nvcr.io/nvidia/pytorch:21.03-py3
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libaio-dev
|
||||
pip install --upgrade pip
|
||||
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
|
||||
pip install .[testing,deepspeed]
|
||||
pip install git+https://github.com/microsoft/DeepSpeed
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_tests_torch_cuda_extensions_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_tests_torch_cuda_extensions_multi_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
container:
|
||||
image: nvcr.io/nvidia/pytorch:21.03-py3
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libaio-dev
|
||||
pip install --upgrade pip
|
||||
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
|
||||
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
|
||||
pip install .[testing,fairscale]
|
||||
pip install git+https://github.com/microsoft/DeepSpeed # testing bleeding edge
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
send_results:
|
||||
name: Send results to webhook
|
||||
runs-on: ubuntu-latest
|
||||
if: always()
|
||||
needs: [
|
||||
run_all_tests_torch_gpu,
|
||||
run_all_tests_torch_multi_gpu,
|
||||
run_all_tests_torch_cuda_extensions_gpu,
|
||||
run_all_tests_torch_cuda_extensions_multi_gpu
|
||||
]
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
|
||||
- uses: actions/download-artifact@v2
|
||||
|
||||
- name: Send message to Slack
|
||||
env:
|
||||
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
|
||||
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
|
||||
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
|
||||
CI_SLACK_CHANNEL_ID_PAST_FUTURE: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }}
|
||||
|
||||
run: |
|
||||
pip install slack_sdk
|
||||
python utils/notification_service.py scheduled nightly-torch
|
||||
295
.github/workflows/self-push.yml
vendored
295
.github/workflows/self-push.yml
vendored
@ -11,6 +11,7 @@ on:
|
||||
- "tests/**"
|
||||
- ".github/**"
|
||||
- "templates/**"
|
||||
- "utils/**"
|
||||
repository_dispatch:
|
||||
|
||||
env:
|
||||
@ -27,32 +28,45 @@ jobs:
|
||||
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
apt install -y libsndfile1-dev
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Fetch the tests to run
|
||||
run: |
|
||||
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: test_fetched
|
||||
path: test_preparation.txt
|
||||
|
||||
- name: Run all non-slow tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_gpu tests
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_gpu $(cat test_list.txt)
|
||||
fi
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_torch_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
@ -62,6 +76,62 @@ jobs:
|
||||
name: run_all_tests_torch_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_tests_flax_gpu:
|
||||
runs-on: [self-hosted, docker-gpu-test, single-gpu]
|
||||
container:
|
||||
image: tensorflow/tensorflow:2.4.1-gpu
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
|
||||
python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
|
||||
|
||||
- name: Fetch the tests to run
|
||||
run: |
|
||||
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: test_fetched
|
||||
path: test_preparation.txt
|
||||
|
||||
- name: Run all non-slow tests on GPU
|
||||
run: |
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_flax_gpu $(cat test_list.txt)
|
||||
fi
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_flax_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_all_tests_flax_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
# run_tests_tf_gpu:
|
||||
# runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
# timeout-minutes: 120
|
||||
@ -69,32 +139,48 @@ jobs:
|
||||
# image: tensorflow/tensorflow:2.4.1-gpu
|
||||
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
# steps:
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
# pip install --upgrade pip
|
||||
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
|
||||
# pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
#
|
||||
# - name: Launcher docker
|
||||
# uses: actions/checkout@v2
|
||||
# with:
|
||||
# fetch-depth: 2
|
||||
#
|
||||
# - name: NVIDIA-SMI
|
||||
# run: |
|
||||
# nvidia-smi
|
||||
#
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# pip install --upgrade pip
|
||||
# pip install .[sklearn,testing,onnxruntime,sentencepiece]
|
||||
#
|
||||
# - name: Are GPUs recognized by our DL frameworks
|
||||
# run: |
|
||||
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
|
||||
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
|
||||
#
|
||||
# - name: Fetch the tests to run
|
||||
# run: |
|
||||
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
#
|
||||
# - name: Report fetched tests
|
||||
# uses: actions/upload-artifact@v2
|
||||
# with:
|
||||
# name: test_fetched
|
||||
# path: test_preparation.txt
|
||||
#
|
||||
# - name: Run all non-slow tests on GPU
|
||||
# env:
|
||||
# TF_NUM_INTRAOP_THREADS: 8
|
||||
# TF_NUM_INTEROP_THREADS: 1
|
||||
# run: |
|
||||
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_gpu tests
|
||||
# if [ -f test_list.txt ]; then
|
||||
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_gpu $(cat test_list.txt)
|
||||
# fi
|
||||
#
|
||||
# - name: Failure short reports
|
||||
# if: ${{ always() }}
|
||||
# if: ${{ failure() }}
|
||||
# run: cat reports/tests_tf_gpu_failures_short.txt
|
||||
#
|
||||
# - name: Test suite reports artifacts
|
||||
@ -111,34 +197,47 @@ jobs:
|
||||
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
|
||||
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
apt install -y libsndfile1-dev
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Fetch the tests to run
|
||||
run: |
|
||||
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: test_fetched
|
||||
path: test_preparation.txt
|
||||
|
||||
- name: Run all non-slow tests on GPU
|
||||
env:
|
||||
MKL_SERVICE_FORCE_INTEL: 1
|
||||
run: |
|
||||
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_multi_gpu tests
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_multi_gpu $(cat test_list.txt)
|
||||
fi
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_torch_multi_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
@ -148,6 +247,62 @@ jobs:
|
||||
name: run_all_tests_torch_multi_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
# run_tests_flax_multi_gpu:
|
||||
# runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
# container:
|
||||
# image: tensorflow/tensorflow:2.4.1-gpu
|
||||
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
# steps:
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
|
||||
# pip install --upgrade pip
|
||||
# pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
|
||||
# pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
#
|
||||
# - name: Launcher docker
|
||||
# uses: actions/checkout@v2
|
||||
# with:
|
||||
# fetch-depth: 2
|
||||
#
|
||||
# - name: NVIDIA-SMI
|
||||
# continue-on-error: true
|
||||
# run: |
|
||||
# nvidia-smi
|
||||
#
|
||||
# - name: Are GPUs recognized by our DL frameworks
|
||||
# run: |
|
||||
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
|
||||
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
|
||||
#
|
||||
# - name: Fetch the tests to run
|
||||
# run: |
|
||||
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
#
|
||||
# - name: Report fetched tests
|
||||
# uses: actions/upload-artifact@v2
|
||||
# with:
|
||||
# name: test_fetched
|
||||
# path: test_preparation.txt
|
||||
#
|
||||
# - name: Run all non-slow tests on GPU
|
||||
# run: |
|
||||
# if [ -f test_list.txt ]; then
|
||||
# python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_flax_multi_gpu $(cat test_list.txt)
|
||||
# fi
|
||||
#
|
||||
# - name: Failure short reports
|
||||
# if: ${{ failure() }}
|
||||
# run: cat reports/tests_flax_multi_gpu_failures_short.txt
|
||||
#
|
||||
# - name: Test suite reports artifacts
|
||||
# if: ${{ always() }}
|
||||
# uses: actions/upload-artifact@v2
|
||||
# with:
|
||||
# name: run_all_tests_flax_multi_gpu_test_reports
|
||||
# path: reports
|
||||
|
||||
# run_tests_tf_multi_gpu:
|
||||
# runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
# timeout-minutes: 120
|
||||
@ -155,32 +310,48 @@ jobs:
|
||||
# image: tensorflow/tensorflow:2.4.1-gpu
|
||||
# options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
# steps:
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
|
||||
# pip install --upgrade pip
|
||||
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
|
||||
# pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
#
|
||||
# - name: Launcher docker
|
||||
# uses: actions/checkout@v2
|
||||
# with:
|
||||
# fetch-depth: 2
|
||||
#
|
||||
# - name: NVIDIA-SMI
|
||||
# run: |
|
||||
# nvidia-smi
|
||||
#
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# pip install --upgrade pip
|
||||
# pip install .[sklearn,testing,onnxruntime,sentencepiece]
|
||||
#
|
||||
# - name: Are GPUs recognized by our DL frameworks
|
||||
# run: |
|
||||
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
|
||||
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
|
||||
#
|
||||
# - name: Fetch the tests to run
|
||||
# run: |
|
||||
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
|
||||
#
|
||||
# - name: Report fetched tests
|
||||
# uses: actions/upload-artifact@v2
|
||||
# with:
|
||||
# name: test_fetched
|
||||
# path: test_preparation.txt
|
||||
#
|
||||
# - name: Run all non-slow tests on GPU
|
||||
# env:
|
||||
# TF_NUM_INTRAOP_THREADS: 8
|
||||
# TF_NUM_INTEROP_THREADS: 1
|
||||
# run: |
|
||||
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_multi_gpu tests
|
||||
# if [ -f test_list.txt ]; then
|
||||
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_multi_gpu $(cat test_list.txt)
|
||||
# fi
|
||||
#
|
||||
# - name: Failure short reports
|
||||
# if: ${{ always() }}
|
||||
# if: ${{ failure() }}
|
||||
# run: cat reports/tests_tf_multi_gpu_failures_short.txt
|
||||
#
|
||||
# - name: Test suite reports artifacts
|
||||
@ -198,6 +369,8 @@ jobs:
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
run: |
|
||||
@ -211,17 +384,26 @@ jobs:
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Fetch the tests to run
|
||||
run: |
|
||||
python utils/tests_fetcher.py --diff_with_last_commit --filters tests/deepspeed tests/extended | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: test_fetched
|
||||
path: test_preparation.txt
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_gpu $(cat test_list.txt)
|
||||
fi
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
@ -239,8 +421,11 @@ jobs:
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
with:
|
||||
fetch-depth: 2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
@ -248,21 +433,31 @@ jobs:
|
||||
run: |
|
||||
apt -y update && apt install -y libaio-dev
|
||||
pip install --upgrade pip
|
||||
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
|
||||
pip install .[testing,deepspeed,fairscale]
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Fetch the tests to run
|
||||
run: |
|
||||
python utils/tests_fetcher.py --diff_with_last_commit --filters tests/deepspeed tests/extended | tee test_preparation.txt
|
||||
|
||||
- name: Report fetched tests
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: test_fetched
|
||||
path: test_preparation.txt
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
|
||||
if [ -f test_list.txt ]; then
|
||||
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_multi_gpu $(cat test_list.txt)
|
||||
fi
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
if: ${{ failure() }}
|
||||
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
|
||||
164
.github/workflows/self-scheduled.yml
vendored
164
.github/workflows/self-scheduled.yml
vendored
@ -15,6 +15,7 @@ env:
|
||||
OMP_NUM_THREADS: 16
|
||||
MKL_NUM_THREADS: 16
|
||||
PYTEST_TIMEOUT: 600
|
||||
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
|
||||
|
||||
jobs:
|
||||
run_all_tests_torch_gpu:
|
||||
@ -32,16 +33,14 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
@ -85,6 +84,47 @@ jobs:
|
||||
name: run_all_tests_torch_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_tests_flax_gpu:
|
||||
runs-on: [self-hosted, docker-gpu-test, single-gpu]
|
||||
container:
|
||||
image: tensorflow/tensorflow:2.4.1-gpu
|
||||
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install --upgrade pip
|
||||
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
|
||||
pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
|
||||
python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_flax_gpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_all_tests_flax_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_tests_tf_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
container:
|
||||
@ -100,8 +140,11 @@ jobs:
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnx,sentencepiece]
|
||||
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
@ -139,6 +182,45 @@ jobs:
|
||||
name: run_all_tests_tf_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
run_all_examples_torch_xla_tpu:
|
||||
runs-on: [self-hosted, docker-tpu-test, tpu-v3-8]
|
||||
container:
|
||||
image: gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm
|
||||
options: --privileged -v "/lib/libtpu.so:/lib/libtpu.so" -v /mnt/cache/.cache/huggingface:/mnt/cache/ --shm-size 16G
|
||||
steps:
|
||||
- name: Launcher docker
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
pip install --upgrade pip
|
||||
pip install .[testing]
|
||||
|
||||
- name: Are TPUs recognized by our DL frameworks
|
||||
env:
|
||||
XRT_TPU_CONFIG: localservice;0;localhost:51011
|
||||
run: |
|
||||
python -c "import torch_xla.core.xla_model as xm; print(xm.xla_device())"
|
||||
|
||||
- name: Run example tests on TPU
|
||||
env:
|
||||
XRT_TPU_CONFIG: "localservice;0;localhost:51011"
|
||||
MKL_SERVICE_FORCE_INTEL: "1" # See: https://github.com/pytorch/pytorch/issues/37377
|
||||
|
||||
run: |
|
||||
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_xla_tpu examples/pytorch/test_xla_examples.py
|
||||
|
||||
- name: Failure short reports
|
||||
if: ${{ always() }}
|
||||
run: cat reports/tests_torch_xla_tpu_failures_short.txt
|
||||
|
||||
- name: Test suite reports artifacts
|
||||
if: ${{ always() }}
|
||||
uses: actions/upload-artifact@v2
|
||||
with:
|
||||
name: run_all_examples_torch_xla_tpu
|
||||
path: reports
|
||||
|
||||
run_all_tests_torch_multi_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
container:
|
||||
@ -149,21 +231,20 @@ jobs:
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,speech,vision,timm]
|
||||
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
env:
|
||||
@ -203,13 +284,16 @@ jobs:
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
apt -y update && apt install -y libsndfile1-dev git
|
||||
pip install --upgrade pip
|
||||
pip install .[sklearn,testing,onnx,sentencepiece]
|
||||
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
|
||||
pip install https://github.com/kpu/kenlm/archive/master.zip
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
@ -247,6 +331,45 @@ jobs:
|
||||
name: run_all_tests_tf_multi_gpu_test_reports
|
||||
path: reports
|
||||
|
||||
# run_all_tests_flax_multi_gpu:
|
||||
# runs-on: [self-hosted, docker-gpu, multi-gpu]
|
||||
# container:
|
||||
# image: tensorflow/tensorflow:2.4.1-gpu
|
||||
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
||||
# steps:
|
||||
# - name: Launcher docker
|
||||
# uses: actions/checkout@v2
|
||||
#
|
||||
# - name: NVIDIA-SMI
|
||||
# run: |
|
||||
# nvidia-smi
|
||||
#
|
||||
# - name: Install dependencies
|
||||
# run: |
|
||||
# pip install --upgrade pip
|
||||
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
|
||||
# pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
|
||||
#
|
||||
# - name: Are GPUs recognized by our DL frameworks
|
||||
# run: |
|
||||
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
|
||||
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
|
||||
#
|
||||
# - name: Run all tests on GPU
|
||||
# run: |
|
||||
# python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
|
||||
#
|
||||
# - name: Failure short reports
|
||||
# if: ${{ always() }}
|
||||
# run: cat reports/tests_flax_gpu_failures_short.txt
|
||||
#
|
||||
# - name: Test suite reports artifacts
|
||||
# if: ${{ always() }}
|
||||
# uses: actions/upload-artifact@v2
|
||||
# with:
|
||||
# name: run_all_tests_flax_gpu_test_reports
|
||||
# path: reports
|
||||
|
||||
run_all_tests_torch_cuda_extensions_gpu:
|
||||
runs-on: [self-hosted, docker-gpu, single-gpu]
|
||||
container:
|
||||
@ -268,10 +391,7 @@ jobs:
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
@ -298,6 +418,7 @@ jobs:
|
||||
uses: actions/checkout@v2
|
||||
|
||||
- name: NVIDIA-SMI
|
||||
continue-on-error: true
|
||||
run: |
|
||||
nvidia-smi
|
||||
|
||||
@ -305,14 +426,12 @@ jobs:
|
||||
run: |
|
||||
apt -y update && apt install -y libaio-dev
|
||||
pip install --upgrade pip
|
||||
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
|
||||
pip install .[testing,deepspeed,fairscale]
|
||||
|
||||
- name: Are GPUs recognized by our DL frameworks
|
||||
run: |
|
||||
python -c "import torch; print('Cuda available:', torch.cuda.is_available())"
|
||||
python -c "import torch; print('Cuda version:', torch.version.cuda)"
|
||||
python -c "import torch; print('CuDNN version:', torch.backends.cudnn.version())"
|
||||
python -c "import torch; print('Number of GPUs available:', torch.cuda.device_count())"
|
||||
utils/print_env_pt.py
|
||||
|
||||
- name: Run all tests on GPU
|
||||
run: |
|
||||
@ -350,6 +469,7 @@ jobs:
|
||||
env:
|
||||
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
|
||||
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
|
||||
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
|
||||
|
||||
|
||||
run: |
|
||||
|
||||
36
.github/workflows/update_metdata.yml
vendored
Normal file
36
.github/workflows/update_metdata.yml
vendored
Normal file
@ -0,0 +1,36 @@
|
||||
name: Update Transformers metadata
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- master
|
||||
- update_transformers_metadata
|
||||
|
||||
jobs:
|
||||
build_and_package:
|
||||
runs-on: ubuntu-latest
|
||||
defaults:
|
||||
run:
|
||||
shell: bash -l {0}
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
|
||||
- name: Loading cache.
|
||||
uses: actions/cache@v2
|
||||
id: cache
|
||||
with:
|
||||
path: ~/.cache/pip
|
||||
key: v1-metadata
|
||||
restore-keys: |
|
||||
v1-metadata-${{ hashFiles('setup.py') }}
|
||||
v1-metadata
|
||||
|
||||
- name: Setup environment
|
||||
run: |
|
||||
pip install git+https://github.com/huggingface/transformers#egg=transformers[dev]
|
||||
|
||||
- name: Update metadata
|
||||
run: |
|
||||
python utils/update_metadata.py --token ${{ secrets.SYLVAIN_HF_TOKEN }} --commit_sha ${{ github.sha }}
|
||||
|
||||
82
CITATION.cff
Normal file
82
CITATION.cff
Normal file
@ -0,0 +1,82 @@
|
||||
cff-version: "1.2.0"
|
||||
date-released: 2020-10
|
||||
message: "If you use this software, please cite it using these metadata."
|
||||
title: "Transformers: State-of-the-Art Natural Language Processing"
|
||||
url: "https://github.com/huggingface/transformers"
|
||||
authors:
|
||||
- family-names: Wolf
|
||||
given-names: Thomas
|
||||
- family-names: Debut
|
||||
given-names: Lysandre
|
||||
- family-names: Sanh
|
||||
given-names: Victor
|
||||
- family-names: Chaumond
|
||||
given-names: Julien
|
||||
- family-names: Delangue
|
||||
given-names: Clement
|
||||
- family-names: Moi
|
||||
given-names: Anthony
|
||||
- family-names: Cistac
|
||||
given-names: Perric
|
||||
- family-names: Ma
|
||||
given-names: Clara
|
||||
- family-names: Jernite
|
||||
given-names: Yacine
|
||||
- family-names: Plu
|
||||
given-names: Julien
|
||||
- family-names: Xu
|
||||
given-names: Canwen
|
||||
- family-names: "Le Scao"
|
||||
given-names: Teven
|
||||
- family-names: Gugger
|
||||
given-names: Sylvain
|
||||
- family-names: Drame
|
||||
given-names: Mariama
|
||||
- family-names: Lhoest
|
||||
given-names: Quentin
|
||||
- family-names: Rush
|
||||
given-names: "Alexander M."
|
||||
preferred-citation:
|
||||
type: conference-paper
|
||||
authors:
|
||||
- family-names: Wolf
|
||||
given-names: Thomas
|
||||
- family-names: Debut
|
||||
given-names: Lysandre
|
||||
- family-names: Sanh
|
||||
given-names: Victor
|
||||
- family-names: Chaumond
|
||||
given-names: Julien
|
||||
- family-names: Delangue
|
||||
given-names: Clement
|
||||
- family-names: Moi
|
||||
given-names: Anthony
|
||||
- family-names: Cistac
|
||||
given-names: Perric
|
||||
- family-names: Ma
|
||||
given-names: Clara
|
||||
- family-names: Jernite
|
||||
given-names: Yacine
|
||||
- family-names: Plu
|
||||
given-names: Julien
|
||||
- family-names: Xu
|
||||
given-names: Canwen
|
||||
- family-names: "Le Scao"
|
||||
given-names: Teven
|
||||
- family-names: Gugger
|
||||
given-names: Sylvain
|
||||
- family-names: Drame
|
||||
given-names: Mariama
|
||||
- family-names: Lhoest
|
||||
given-names: Quentin
|
||||
- family-names: Rush
|
||||
given-names: "Alexander M."
|
||||
booktitle: "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations"
|
||||
month: 10
|
||||
start: 38
|
||||
end: 45
|
||||
title: "Transformers: State-of-the-Art Natural Language Processing"
|
||||
year: 2020
|
||||
publisher: "Association for Computational Linguistics"
|
||||
url: "https://www.aclweb.org/anthology/2020.emnlp-demos.6"
|
||||
address: "Online"
|
||||
@ -273,8 +273,13 @@ Follow these steps to start contributing:
|
||||
- If you are adding a new tokenizer, write tests, and make sure
|
||||
`RUN_SLOW=1 python -m pytest tests/test_tokenization_{your_model_name}.py` passes.
|
||||
CircleCI does not run the slow tests, but github actions does every night!
|
||||
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
|
||||
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_bert.py` for an
|
||||
example.
|
||||
7. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
|
||||
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
|
||||
them by URL.
|
||||
|
||||
See more about the checks run on a pull request in our [PR guide](pr_checks)
|
||||
|
||||
### Tests
|
||||
|
||||
|
||||
@ -205,7 +205,7 @@ You are not required to read the following guidelines before opening an issue. H
|
||||
|
||||
If you really tried to make a short reproducible code but couldn't figure it out, it might be that having a traceback will give the developer enough information to know what's going on. But if it is not enough and we can't reproduce the problem, we can't really solve it.
|
||||
|
||||
Do not dispair if you can't figure it out from the begining, just share what you can and perhaps someone else will be able to help you at the forums.
|
||||
Do not despair if you can't figure it out from the beginning, just share what you can and perhaps someone else will be able to help you at the forums.
|
||||
|
||||
If your setup involves any custom datasets, the best way to help us reproduce the problem is to create a [Google Colab notebook](https://colab.research.google.com/) that demonstrates the issue and once you verify that the issue still exists, include a link to that notebook in the Issue. Just make sure that you don't copy and paste the location bar url of the open notebook - as this is private and we won't be able to open it. Instead, you need to click on `Share` in the right upper corner of the notebook, select `Get Link` and then copy and paste the public link it will give to you.
|
||||
|
||||
|
||||
11
Makefile
11
Makefile
@ -30,11 +30,10 @@ deps_table_check_updated:
|
||||
# autogenerating code
|
||||
|
||||
autogenerate_code: deps_table_update
|
||||
python utils/class_mapping_update.py
|
||||
|
||||
# Check that source code meets quality standards
|
||||
# Check that the repo is in a good state
|
||||
|
||||
extra_quality_checks:
|
||||
repo-consistency:
|
||||
python utils/check_copies.py
|
||||
python utils/check_table.py
|
||||
python utils/check_dummies.py
|
||||
@ -43,12 +42,13 @@ extra_quality_checks:
|
||||
python utils/tests_fetcher.py --sanity_check
|
||||
|
||||
# this target runs checks on all files
|
||||
|
||||
quality:
|
||||
black --check $(check_dirs)
|
||||
isort --check-only $(check_dirs)
|
||||
python utils/custom_init_isort.py --check_only
|
||||
flake8 $(check_dirs)
|
||||
${MAKE} extra_quality_checks
|
||||
python utils/style_doc.py src/transformers docs/source --max_len 119 --check_only
|
||||
|
||||
# Format source code automatically and check is there are any problems left that need manual fixing
|
||||
|
||||
@ -57,6 +57,7 @@ extra_style_checks:
|
||||
python utils/style_doc.py src/transformers docs/source --max_len 119
|
||||
|
||||
# this target runs checks on all files and potentially modifies some of them
|
||||
|
||||
style:
|
||||
black $(check_dirs)
|
||||
isort $(check_dirs)
|
||||
@ -65,7 +66,7 @@ style:
|
||||
|
||||
# Super fast fix and check target that only works on relevant modified files since the branch was made
|
||||
|
||||
fixup: modified_only_fixup extra_style_checks autogenerate_code extra_quality_checks
|
||||
fixup: modified_only_fixup extra_style_checks autogenerate_code repo-consistency
|
||||
|
||||
# Make marked copies of snippets of codes conform to the original
|
||||
|
||||
|
||||
212
README.md
212
README.md
@ -26,8 +26,8 @@ limitations under the License.
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
|
||||
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/transformers/index.html">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red&down_message=offline&up_message=online">
|
||||
<a href="https://huggingface.co/docs/transformers/index">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/releases">
|
||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
|
||||
@ -42,19 +42,28 @@ limitations under the License.
|
||||
<p>
|
||||
<b>English</b> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hans.md">简体中文</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hant.md">繁體中文</a>
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hant.md">繁體中文</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_ko.md">한국어</a>
|
||||
<p>
|
||||
</h4>
|
||||
|
||||
<h3 align="center">
|
||||
<p>State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow</p>
|
||||
<p>State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow</p>
|
||||
</h3>
|
||||
|
||||
<h3 align="center">
|
||||
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a>
|
||||
</h3>
|
||||
|
||||
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.
|
||||
🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
|
||||
|
||||
These models can be applied on:
|
||||
|
||||
* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.
|
||||
* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
|
||||
* 🗣️ Audio, for tasks like speech recognition and audio classification.
|
||||
|
||||
Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
|
||||
|
||||
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
|
||||
|
||||
@ -65,6 +74,8 @@ limitations under the License.
|
||||
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) for public and private models.
|
||||
|
||||
Here are a few examples:
|
||||
|
||||
In Natural Language Processing:
|
||||
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
|
||||
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
|
||||
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
|
||||
@ -73,6 +84,15 @@ Here are a few examples:
|
||||
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
|
||||
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
|
||||
|
||||
In Computer Vision:
|
||||
- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
|
||||
- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
|
||||
- [Image Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic)
|
||||
|
||||
In Audio:
|
||||
- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
|
||||
- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
|
||||
|
||||
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.
|
||||
|
||||
## If you are looking for custom support from the Hugging Face team
|
||||
@ -83,7 +103,7 @@ Here are a few examples:
|
||||
|
||||
## Quick tour
|
||||
|
||||
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
|
||||
To immediately use a model on a given input (text, image, audio, ...), we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
|
||||
|
||||
```python
|
||||
>>> from transformers import pipeline
|
||||
@ -111,7 +131,7 @@ Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can ea
|
||||
|
||||
```
|
||||
|
||||
In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
|
||||
In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/docs/transformers/task_summary).
|
||||
|
||||
To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
|
||||
```python
|
||||
@ -136,12 +156,12 @@ And here is the equivalent code for TensorFlow:
|
||||
|
||||
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
|
||||
|
||||
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. [This tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
|
||||
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. [This tutorial](https://huggingface.co/docs/transformers/training) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
|
||||
|
||||
## Why should I use transformers?
|
||||
|
||||
1. Easy-to-use state-of-the-art models:
|
||||
- High performance on NLU and NLG tasks.
|
||||
- High performance on natural language understanding & generation, computer vision, and audio tasks.
|
||||
- Low barrier to entry for educators and practitioners.
|
||||
- Few user-facing abstractions with just three classes to learn.
|
||||
- A unified API for using all our pretrained models.
|
||||
@ -149,11 +169,11 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
|
||||
1. Lower compute costs, smaller carbon footprint:
|
||||
- Researchers can share trained models instead of always retraining.
|
||||
- Practitioners can reduce compute time and production costs.
|
||||
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
|
||||
- Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
|
||||
|
||||
1. Choose the right framework for every part of a model's lifetime:
|
||||
- Train state-of-the-art models in 3 lines of code.
|
||||
- Move a single model between TF2.0/PyTorch frameworks at will.
|
||||
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
|
||||
- Seamlessly pick the right framework for training, evaluation and production.
|
||||
|
||||
1. Easily customize a model or an example to your needs:
|
||||
@ -186,7 +206,7 @@ When one of those backends has been installed, 🤗 Transformers can be installe
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
|
||||
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must [install the library from source](https://huggingface.co/docs/transformers/installation#installing-from-source).
|
||||
|
||||
### With conda
|
||||
|
||||
@ -206,91 +226,115 @@ Follow the installation pages of Flax, PyTorch or TensorFlow to see how to insta
|
||||
|
||||
Current number of checkpoints: 
|
||||
|
||||
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
|
||||
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them):
|
||||
|
||||
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/transformers/model_doc/bigbird.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/transformers/model_doc/bigbird_pegasus.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](https://huggingface.co/transformers/model_doc/byt5.html)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](https://huggingface.co/transformers/model_doc/canine.html)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](https://huggingface.co/transformers/model_doc/convbert.html)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](https://huggingface.co/transformers/model_doc/cpm.html)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](https://huggingface.co/transformers/model_doc/deberta_v2.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](https://huggingface.co/transformers/model_doc/deit.html)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](https://huggingface.co/transformers/model_doc/detr.html)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
|
||||
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
|
||||
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval
|
||||
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
|
||||
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
|
||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](https://huggingface.co/transformers/model_doc/luke.html)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](https://huggingface.co/transformers/model_doc/m2m_100.html)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](https://huggingface.co/transformers/model_doc/megatron_bert.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](https://huggingface.co/transformers/model_doc/megatron_gpt2.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[MPNet](https://huggingface.co/transformers/model_doc/mpnet.html)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](https://huggingface.co/transformers/model_doc/roformer.html)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/transformers/model_doc/speech_to_text.html)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/transformers/model_doc/vit.html)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](https://huggingface.co/transformers/model_doc/visual_bert.html)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](https://huggingface.co/transformers/model_doc/wav2vec2.html)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/transformers/model_doc/xlsr_wav2vec2.html)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||
1. **[ImageGPT](https://huggingface.co/docs/transformers/master/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
|
||||
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
|
||||
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
|
||||
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
|
||||
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
|
||||
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
||||
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
||||
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
|
||||
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
||||
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
||||
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER
|
||||
AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
|
||||
|
||||
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#supported-frameworks).
|
||||
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).
|
||||
|
||||
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
|
||||
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the [documentation](https://huggingface.co/docs/transformers/examples).
|
||||
|
||||
|
||||
## Learn more
|
||||
|
||||
| Section | Description |
|
||||
|-|-|
|
||||
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
|
||||
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
|
||||
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
|
||||
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
|
||||
| [Documentation](https://huggingface.co/docs/transformers/) | Full API documentation and tutorials |
|
||||
| [Task summary](https://huggingface.co/docs/transformers/task_summary) | Tasks supported by 🤗 Transformers |
|
||||
| [Preprocessing tutorial](https://huggingface.co/docstransformers/preprocessing) | Using the `Tokenizer` class to prepare data for the models |
|
||||
| [Training and fine-tuning](https://huggingface.co/docs/transformers/training) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
|
||||
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
|
||||
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
|
||||
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
|
||||
| [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
|
||||
| [Migration](https://huggingface.co/docs/transformers/migration) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
|
||||
|
||||
## Citation
|
||||
|
||||
|
||||
331
README_ko.md
Normal file
331
README_ko.md
Normal file
@ -0,0 +1,331 @@
|
||||
<!---
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
-->
|
||||
|
||||
<p align="center">
|
||||
<br>
|
||||
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/>
|
||||
<br>
|
||||
<p>
|
||||
<p align="center">
|
||||
<a href="https://circleci.com/gh/huggingface/transformers">
|
||||
<img alt="Build" src="https://img.shields.io/circleci/build/github/huggingface/transformers/master">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
|
||||
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/docs/transformers/index">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/releases">
|
||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/CODE_OF_CONDUCT.md">
|
||||
<img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg">
|
||||
</a>
|
||||
<a href="https://zenodo.org/badge/latestdoi/155220641"><img src="https://zenodo.org/badge/155220641.svg" alt="DOI"></a>
|
||||
</p>
|
||||
|
||||
<h4 align="center">
|
||||
<p>
|
||||
<a href="https://github.com/huggingface/transformers/">English</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hans.md">简体中文</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hant.md">繁體中文</a> |
|
||||
<b>한국어</b>
|
||||
<p>
|
||||
</h4>
|
||||
|
||||
<h3 align="center">
|
||||
<p> Jax, Pytorch, TensorFlow를 위한 최첨단 자연어처리</p>
|
||||
</h3>
|
||||
|
||||
<h3 align="center">
|
||||
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a>
|
||||
</h3>
|
||||
|
||||
🤗 Transformers는 분류, 정보 추출, 질문 답변, 요약, 번역, 문장 생성 등을 100개 이상의 언어로 수행할 수 있는 수천개의 사전학습된 모델을 제공합니다. 우리의 목표는 모두가 최첨단의 NLP 기술을 쉽게 사용하는 것입니다.
|
||||
|
||||
🤗 Transformers는 이러한 사전학습 모델을 빠르게 다운로드해 특정 텍스트에 사용하고, 원하는 데이터로 fine-tuning해 커뮤니티나 우리의 [모델 허브](https://huggingface.co/models)에 공유할 수 있도록 API를 제공합니다. 또한, 모델 구조를 정의하는 각 파이썬 모듈은 완전히 독립적이여서 연구 실험을 위해 손쉽게 수정할 수 있습니다.
|
||||
|
||||
🤗 Transformers는 가장 유명한 3개의 딥러닝 라이브러리를 지원합니다. 이들은 서로 완벽히 연동됩니다 — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/). 간단하게 이 라이브러리 중 하나로 모델을 학습하고, 또 다른 라이브러리로 추론을 위해 모델을 불러올 수 있습니다.
|
||||
|
||||
## 온라인 데모
|
||||
|
||||
대부분의 모델을 [모델 허브](https://huggingface.co/models) 페이지에서 바로 테스트해볼 수 있습니다. 공개 및 비공개 모델을 위한 [비공개 모델 호스팅, 버전 관리, 추론 API](https://huggingface.co/pricing)도 제공합니다.
|
||||
|
||||
예시:
|
||||
- [BERT로 마스킹된 단어 완성하기](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
|
||||
- [Electra를 이용한 개체명 인식](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
|
||||
- [GPT-2로 텍스트 생성하기](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
|
||||
- [RoBERTa로 자연어 추론하기](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
|
||||
- [BART를 이용한 요약](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
|
||||
- [DistilBERT를 이용한 질문 답변](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
|
||||
- [T5로 번역하기](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
|
||||
|
||||
**[Transformer와 글쓰기](https://transformer.huggingface.co)** 는 이 저장소의 텍스트 생성 능력에 관한 Hugging Face 팀의 공식 데모입니다.
|
||||
|
||||
## Hugging Face 팀의 커스텀 지원을 원한다면
|
||||
|
||||
<a target="_blank" href="https://huggingface.co/support">
|
||||
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
|
||||
</a><br>
|
||||
|
||||
## 퀵 투어
|
||||
|
||||
원하는 텍스트에 바로 모델을 사용할 수 있도록, 우리는 `pipeline` API를 제공합니다. Pipeline은 사전학습 모델과 그 모델을 학습할 때 적용한 전처리 방식을 하나로 합칩니다. 다음은 긍정적인 텍스트와 부정적인 텍스트를 분류하기 위해 pipeline을 사용한 간단한 예시입니다:
|
||||
|
||||
```python
|
||||
>>> from transformers import pipeline
|
||||
|
||||
# Allocate a pipeline for sentiment-analysis
|
||||
>>> classifier = pipeline('sentiment-analysis')
|
||||
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
|
||||
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
|
||||
```
|
||||
|
||||
코드의 두번째 줄은 pipeline이 사용하는 사전학습 모델을 다운로드하고 캐시로 저장합니다. 세번째 줄에선 그 모델이 주어진 텍스트를 평가합니다. 여기서 모델은 99.97%의 확률로 텍스트가 긍정적이라고 평가했습니다.
|
||||
|
||||
많은 NLP 과제들을 `pipeline`으로 바로 수행할 수 있습니다. 예를 들어, 질문과 문맥이 주어지면 손쉽게 답변을 추출할 수 있습니다:
|
||||
|
||||
``` python
|
||||
>>> from transformers import pipeline
|
||||
|
||||
# Allocate a pipeline for question-answering
|
||||
>>> question_answerer = pipeline('question-answering')
|
||||
>>> question_answerer({
|
||||
... 'question': 'What is the name of the repository ?',
|
||||
... 'context': 'Pipeline has been included in the huggingface/transformers repository'
|
||||
... })
|
||||
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}
|
||||
|
||||
```
|
||||
|
||||
답변뿐만 아니라, 여기에 사용된 사전학습 모델은 확신도와 토크나이즈된 문장 속 답변의 시작점, 끝점까지 반환합니다. [이 튜토리얼](https://huggingface.co/docs/transformers/task_summary)에서 `pipeline` API가 지원하는 다양한 과제를 확인할 수 있습니다.
|
||||
|
||||
코드 3줄로 원하는 과제에 맞게 사전학습 모델을 다운로드 받고 사용할 수 있습니다. 다음은 PyTorch 버전입니다:
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer, AutoModel
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
>>> model = AutoModel.from_pretrained("bert-base-uncased")
|
||||
|
||||
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
|
||||
>>> outputs = model(**inputs)
|
||||
```
|
||||
다음은 TensorFlow 버전입니다:
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer, TFAutoModel
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")
|
||||
|
||||
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
|
||||
>>> outputs = model(**inputs)
|
||||
```
|
||||
|
||||
토크나이저는 사전학습 모델의 모든 전처리를 책임집니다. 그리고 (위의 예시처럼) 1개의 스트링이나 리스트도 처리할 수 있습니다. 토크나이저는 딕셔너리를 반환하는데, 이는 다운스트림 코드에 사용하거나 언패킹 연산자 ** 를 이용해 모델에 바로 전달할 수도 있습니다.
|
||||
|
||||
모델 자체는 일반적으로 사용되는 [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)나 [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)입니다. [이 튜토리얼](https://huggingface.co/transformers/training.html)은 이러한 모델을 표준적인 PyTorch나 TensorFlow 학습 과정에서 사용하는 방법, 또는 새로운 데이터로 fine-tune하기 위해 `Trainer` API를 사용하는 방법을 설명해줍니다.
|
||||
|
||||
## 왜 transformers를 사용해야 할까요?
|
||||
|
||||
1. 손쉽게 사용할 수 있는 최첨단 모델:
|
||||
- NLU와 NLG 과제에서 뛰어난 성능을 보입니다.
|
||||
- 교육자 실무자에게 진입 장벽이 낮습니다.
|
||||
- 3개의 클래스만 배우면 바로 사용할 수 있습니다.
|
||||
- 하나의 API로 모든 사전학습 모델을 사용할 수 있습니다.
|
||||
|
||||
1. 더 적은 계산 비용, 더 적은 탄소 발자국:
|
||||
- 연구자들은 모델을 계속 다시 학습시키는 대신 학습된 모델을 공유할 수 있습니다.
|
||||
- 실무자들은 학습에 필요한 시간과 비용을 절약할 수 있습니다.
|
||||
- 수십개의 모델 구조, 2,000개 이상의 사전학습 모델, 100개 이상의 언어로 학습된 모델 등.
|
||||
|
||||
1. 모델의 각 생애주기에 적합한 프레임워크:
|
||||
- 코드 3줄로 최첨단 모델을 학습하세요.
|
||||
- 자유롭게 모델을 TF2.0나 PyTorch 프레임워크로 변환하세요.
|
||||
- 학습, 평가, 공개 등 각 단계에 맞는 프레임워크를 원하는대로 선택하세요.
|
||||
|
||||
1. 필요한 대로 모델이나 예시를 커스터마이즈하세요:
|
||||
- 우리는 저자가 공개한 결과를 재현하기 위해 각 모델 구조의 예시를 제공합니다.
|
||||
- 모델 내부 구조는 가능한 일관적으로 공개되어 있습니다.
|
||||
- 빠른 실험을 위해 모델 파일은 라이브러리와 독립적으로 사용될 수 있습니다.
|
||||
|
||||
## 왜 transformers를 사용하지 말아야 할까요?
|
||||
|
||||
- 이 라이브러리는 신경망 블록을 만들기 위한 모듈이 아닙니다. 연구자들이 여러 파일을 살펴보지 않고 바로 각 모델을 사용할 수 있도록, 모델 파일 코드의 추상화 수준을 적정하게 유지했습니다.
|
||||
- 학습 API는 모든 모델에 적용할 수 있도록 만들어지진 않았지만, 라이브러리가 제공하는 모델들에 적용할 수 있도록 최적화되었습니다. 일반적인 머신 러닝을 위해선, 다른 라이브러리를 사용하세요.
|
||||
- 가능한 많은 사용 예시를 보여드리고 싶어서, [예시 폴더](https://github.com/huggingface/transformers/tree/master/examples)의 스크립트를 준비했습니다. 이 스크립트들을 수정 없이 특정한 문제에 바로 적용하지 못할 수 있습니다. 필요에 맞게 일부 코드를 수정해야 할 수 있습니다.
|
||||
|
||||
## 설치
|
||||
|
||||
### pip로 설치하기
|
||||
|
||||
이 저장소는 Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+, TensorFlow 2.3+에서 테스트 되었습니다.
|
||||
|
||||
[가상 환경](https://docs.python.org/3/library/venv.html)에 🤗 Transformers를 설치하세요. Python 가상 환경에 익숙하지 않다면, [사용자 가이드](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)를 확인하세요.
|
||||
|
||||
우선, 사용할 Python 버전으로 가상 환경을 만들고 실행하세요.
|
||||
|
||||
그 다음, Flax, PyTorch, TensorFlow 중 적어도 하나는 설치해야 합니다.
|
||||
플랫폼에 맞는 설치 명령어를 확인하기 위해 [TensorFlow 설치 페이지](https://www.tensorflow.org/install/), [PyTorch 설치 페이지](https://pytorch.org/get-started/locally/#start-locally), [Flax 설치 페이지](https://github.com/google/flax#quick-install)를 확인하세요.
|
||||
|
||||
이들 중 적어도 하나가 설치되었다면, 🤗 Transformers는 다음과 같이 pip을 이용해 설치할 수 있습니다:
|
||||
|
||||
```bash
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
예시들을 체험해보고 싶거나, 최최최첨단 코드를 원하거나, 새로운 버전이 나올 때까지 기다릴 수 없다면 [라이브러리를 소스에서 바로 설치](https://huggingface.co/docs/transformers/installation#installing-from-source)하셔야 합니다.
|
||||
|
||||
### conda로 설치하기
|
||||
|
||||
Transformers 버전 v4.0.0부터, conda 채널이 생겼습니다: `huggingface`.
|
||||
|
||||
🤗 Transformers는 다음과 같이 conda로 설치할 수 있습니다:
|
||||
|
||||
```shell script
|
||||
conda install -c huggingface transformers
|
||||
```
|
||||
|
||||
Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 방법을 확인하세요.
|
||||
|
||||
## 모델 구조
|
||||
|
||||
**🤗 Transformers가 제공하는 [모든 모델 체크포인트](https://huggingface.co/models)** 는 huggingface.co [모델 허브](https://huggingface.co)에 완벽히 연동되어 있습니다. [개인](https://huggingface.co/users)과 [기관](https://huggingface.co/organizations)이 모델 허브에 직접 업로드할 수 있습니다.
|
||||
|
||||
현재 사용 가능한 모델 체크포인트의 개수: 
|
||||
|
||||
🤗 Transformers는 다음 모델들을 제공합니다 (각 모델의 요약은 [여기](https://huggingface.co/docs/transformers/model_summary)서 확인하세요):
|
||||
|
||||
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
|
||||
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||
1. **[ImageGPT](https://huggingface.co/docs/transformers/master/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
|
||||
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
|
||||
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
|
||||
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
|
||||
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
|
||||
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
||||
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
||||
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
|
||||
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
||||
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
||||
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다.
|
||||
|
||||
각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요.
|
||||
|
||||
이 구현은 여러 데이터로 검증되었고 (예시 스크립트를 참고하세요) 오리지널 구현의 성능과 같아야 합니다. [도큐먼트](https://huggingface.co/docs/transformers/examples)의 Examples 섹션에서 성능에 대한 자세한 설명을 확인할 수 있습니다.
|
||||
|
||||
## 더 알아보기
|
||||
|
||||
| 섹션 | 설명 |
|
||||
|-|-|
|
||||
| [도큐먼트](https://huggingface.co/transformers/) | 전체 API 도큐먼트와 튜토리얼 |
|
||||
| [과제 요약](https://huggingface.co/docs/transformers/task_summary) | 🤗 Transformers가 지원하는 과제들 |
|
||||
| [전처리 튜토리얼](https://huggingface.co/docs/transformers/preprocessing) | `Tokenizer` 클래스를 이용해 모델을 위한 데이터 준비하기 |
|
||||
| [학습과 fine-tuning](https://huggingface.co/docs/transformers/training) | 🤗 Transformers가 제공하는 모델 PyTorch/TensorFlow 학습 과정과 `Trainer` API에서 사용하기 |
|
||||
| [퀵 투어: Fine-tuning/사용 스크립트](https://github.com/huggingface/transformers/tree/master/examples) | 다양한 과제에서 모델 fine-tuning하는 예시 스크립트 |
|
||||
| [모델 공유 및 업로드](https://huggingface.co/docs/transformers/model_sharing) | 커뮤니티에 fine-tune된 모델을 업로드 및 공유하기 |
|
||||
| [마이그레이션](https://huggingface.co/docs/transformers/migration) | `pytorch-transformers`나 `pytorch-pretrained-bert`에서 🤗 Transformers로 이동하기|
|
||||
|
||||
## 인용
|
||||
|
||||
🤗 Transformers 라이브러리를 인용하고 싶다면, 이 [논문](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)을 인용해 주세요:
|
||||
```bibtex
|
||||
@inproceedings{wolf-etal-2020-transformers,
|
||||
title = "Transformers: State-of-the-Art Natural Language Processing",
|
||||
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
|
||||
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
|
||||
month = oct,
|
||||
year = "2020",
|
||||
address = "Online",
|
||||
publisher = "Association for Computational Linguistics",
|
||||
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
|
||||
pages = "38--45"
|
||||
}
|
||||
```
|
||||
@ -51,8 +51,8 @@ checkpoint: 检查点
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
|
||||
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/transformers/index.html">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red&down_message=offline&up_message=online">
|
||||
<a href="https://huggingface.co/docs/transformers/index">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/releases">
|
||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
|
||||
@ -67,7 +67,8 @@ checkpoint: 检查点
|
||||
<p>
|
||||
<a href="https://github.com/huggingface/transformers/">English</a> |
|
||||
<b>简体中文</b> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hant.md">繁體中文</a>
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hant.md">繁體中文</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_ko.md">한국어</a>
|
||||
<p>
|
||||
</h4>
|
||||
|
||||
@ -136,7 +137,7 @@ checkpoint: 检查点
|
||||
|
||||
```
|
||||
|
||||
除了给出答案,预训练模型还给出了对应的置信度分数、答案在词符化 (tokenized) 后的文本中开始和结束的位置。你可以从[这个教程](https://huggingface.co/transformers/task_summary.html)了解更多流水线API支持的任务。
|
||||
除了给出答案,预训练模型还给出了对应的置信度分数、答案在词符化 (tokenized) 后的文本中开始和结束的位置。你可以从[这个教程](https://huggingface.co/docs/transformers/task_summary)了解更多流水线API支持的任务。
|
||||
|
||||
要在你的任务上下载和使用任意预训练模型也很简单,只需三行代码。这里是 PyTorch 版的示例:
|
||||
```python
|
||||
@ -210,7 +211,7 @@ checkpoint: 检查点
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
如果你想要试试用例或者想在正式发布前使用最新的开发中代码,你得[从源代码安装](https://huggingface.co/transformers/installation.html#installing-from-source)。
|
||||
如果你想要试试用例或者想在正式发布前使用最新的开发中代码,你得[从源代码安装](https://huggingface.co/docs/transformers/installation#installing-from-source)。
|
||||
|
||||
### 使用 conda
|
||||
|
||||
@ -230,78 +231,99 @@ conda install -c huggingface transformers
|
||||
|
||||
目前的检查点数量: 
|
||||
|
||||
🤗 Transformers 目前支持如下的架构(模型概述请阅[这里](https://huggingface.co/transformers/model_summary.html)):
|
||||
🤗 Transformers 目前支持如下的架构(模型概述请阅[这里](https://huggingface.co/docs/transformers/model_summary)):
|
||||
|
||||
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (来自 Google Research and the Toyota Technological Institute at Chicago) 伴随论文 [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), 由 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 发布。
|
||||
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (来自 Facebook) 伴随论文 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。
|
||||
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (来自 École polytechnique) 伴随论文 [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。
|
||||
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (来自 Google) 伴随论文 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (来自 Google) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/transformers/model_doc/bigbird.html)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/transformers/model_doc/bigbird_pegasus.html)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
|
||||
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
|
||||
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
|
||||
1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
|
||||
1. **[ByT5](https://huggingface.co/transformers/model_doc/byt5.html)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
|
||||
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
|
||||
1. **[CANINE](https://huggingface.co/transformers/model_doc/canine.html)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
|
||||
1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
|
||||
1. **[ConvBERT](https://huggingface.co/transformers/model_doc/convbert.html)** (来自 YituTech) 伴随论文 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。
|
||||
1. **[CPM](https://huggingface.co/transformers/model_doc/cpm.html)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
|
||||
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
|
||||
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
|
||||
1. **[DeBERTa-v2](https://huggingface.co/transformers/model_doc/deberta_v2.html)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
|
||||
1. **[DeiT](https://huggingface.co/transformers/model_doc/deit.html)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
|
||||
1. **[DETR](https://huggingface.co/transformers/model_doc/detr.html)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
|
||||
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
|
||||
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。
|
||||
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval
|
||||
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon
|
||||
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
|
||||
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
|
||||
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
|
||||
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
|
||||
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
|
||||
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。
|
||||
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
|
||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
|
||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
|
||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
|
||||
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
|
||||
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
|
||||
1. **[LUKE](https://huggingface.co/transformers/model_doc/luke.html)** (来自 Studio Ousia) 伴随论文 [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 发布。
|
||||
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。
|
||||
1. **[M2M100](https://huggingface.co/transformers/model_doc/m2m_100.html)** (来自 Facebook) 伴随论文 [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。
|
||||
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** 用 [OPUS](http://opus.nlpl.eu/) 数据训练的机器翻译模型由 Jörg Tiedemann 发布。[Marian Framework](https://marian-nmt.github.io/) 由微软翻译团队开发。
|
||||
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
|
||||
1. **[MBart-50](https://huggingface.co/transformers/model_doc/mbart.html)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
|
||||
1. **[Megatron-BERT](https://huggingface.co/transformers/model_doc/megatron_bert.html)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
|
||||
1. **[Megatron-GPT2](https://huggingface.co/transformers/model_doc/megatron_gpt2.html)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
|
||||
1. **[MPNet](https://huggingface.co/transformers/model_doc/mpnet.html)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
|
||||
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
|
||||
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
|
||||
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
|
||||
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
|
||||
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
|
||||
1. **[RoFormer](https://huggingface.co/transformers/model_doc/roformer.html)** (来自 ZhuiyiTechnology), 伴随论文 [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/transformers/model_doc/speech_to_text.html)** (来自 Facebook), 伴随论文 [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 发布。
|
||||
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
|
||||
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
|
||||
1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
|
||||
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/transformers/model_doc/vit.html)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
|
||||
1. **[VisualBERT](https://huggingface.co/transformers/model_doc/visual_bert.html)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
|
||||
1. **[Wav2Vec2](https://huggingface.co/transformers/model_doc/wav2vec2.html)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
|
||||
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (来自 Facebook AI), 伴随论文 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。
|
||||
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/transformers/model_doc/xlsr_wav2vec2.html)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
|
||||
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (来自 Google Research and the Toyota Technological Institute at Chicago) 伴随论文 [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), 由 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 发布。
|
||||
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (来自 Facebook) 伴随论文 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。
|
||||
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (来自 École polytechnique) 伴随论文 [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。
|
||||
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。
|
||||
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (来自 Microsoft) 伴随论文 [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) 由 Hangbo Bao, Li Dong, Furu Wei 发布。
|
||||
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (来自 Google) 伴随论文 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (来自 Google) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
||||
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (来自 VinAI Research) 伴随论文 [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 发布。
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
|
||||
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
|
||||
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
|
||||
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
|
||||
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
|
||||
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
|
||||
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
|
||||
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
|
||||
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (来自 YituTech) 伴随论文 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。
|
||||
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
|
||||
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
|
||||
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
|
||||
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
|
||||
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
|
||||
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
|
||||
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
|
||||
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。
|
||||
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
|
||||
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
|
||||
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
|
||||
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
|
||||
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。
|
||||
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
|
||||
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
|
||||
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
|
||||
1. **[ImageGPT](https://huggingface.co/docs/transformers/master/model_doc/imagegpt)** (来自 OpenAI) 伴随论文 [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
|
||||
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
|
||||
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
|
||||
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
|
||||
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
|
||||
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
|
||||
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (来自 Studio Ousia) 伴随论文 [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 发布。
|
||||
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。
|
||||
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (来自 Facebook) 伴随论文 [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。
|
||||
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** 用 [OPUS](http://opus.nlpl.eu/) 数据训练的机器翻译模型由 Jörg Tiedemann 发布。[Marian Framework](https://marian-nmt.github.io/) 由微软翻译团队开发。
|
||||
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
|
||||
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
|
||||
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
|
||||
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
|
||||
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
|
||||
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
|
||||
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
|
||||
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
|
||||
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
|
||||
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
|
||||
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
|
||||
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (来自 NVIDIA) 伴随论文 [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。
|
||||
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
|
||||
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (来自 Google Research) 伴随论文 [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。
|
||||
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
|
||||
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (来自 ZhuiyiTechnology), 伴随论文 [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
|
||||
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (来自 NVIDIA) 伴随论文 [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 发布。
|
||||
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (来自 ASAPP) 伴随论文 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
|
||||
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (来自 ASAPP) 伴随论文 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (来自 Facebook), 伴随论文 [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 发布。
|
||||
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (来自 Facebook) 伴随论文 [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。
|
||||
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (来自 Tel Aviv University) 伴随论文 [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。
|
||||
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
|
||||
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
|
||||
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (来自 Google AI) 伴随论文 [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
|
||||
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
|
||||
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
|
||||
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
|
||||
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
|
||||
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (来自 Microsoft Research) 伴随论文 [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 发布。
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
|
||||
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
|
||||
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
|
||||
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (来自 Facebook AI), 伴随论文 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。
|
||||
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
|
||||
1. 想要贡献新的模型?我们这里有一份**详细指引和模板**来引导你添加新的模型。你可以在 [`templates`](./templates) 目录中找到他们。记得查看 [贡献指南](./CONTRIBUTING.md) 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。
|
||||
|
||||
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器(tokenizer),敬请参阅[此表](https://huggingface.co/transformers/index.html#supported-frameworks)。
|
||||
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器(tokenizer),敬请参阅[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。
|
||||
|
||||
这些实现均已于多个数据集测试(请参看用例脚本)并应于原版实现表现相当。你可以在用例文档的[此节](https://huggingface.co/transformers/examples.html)中了解表现的细节。
|
||||
这些实现均已于多个数据集测试(请参看用例脚本)并应于原版实现表现相当。你可以在用例文档的[此节](https://huggingface.co/docs/transformers/examples)中了解表现的细节。
|
||||
|
||||
|
||||
## 了解更多
|
||||
@ -309,12 +331,12 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布
|
||||
| 章节 | 描述 |
|
||||
|-|-|
|
||||
| [文档](https://huggingface.co/transformers/) | 完整的 API 文档和教程 |
|
||||
| [任务总结](https://huggingface.co/transformers/task_summary.html) | 🤗 Transformers 支持的任务 |
|
||||
| [预处理教程](https://huggingface.co/transformers/preprocessing.html) | 使用 `Tokenizer` 来为模型准备数据 |
|
||||
| [训练和微调](https://huggingface.co/transformers/training.html) | 在 PyTorch/TensorFlow 的训练循环或 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
|
||||
| [任务总结](https://huggingface.co/docs/transformers/task_summary) | 🤗 Transformers 支持的任务 |
|
||||
| [预处理教程](https://huggingface.co/docs/transformers/preprocessing) | 使用 `Tokenizer` 来为模型准备数据 |
|
||||
| [训练和微调](https://huggingface.co/docstransformers/training) | 在 PyTorch/TensorFlow 的训练循环或 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
|
||||
| [快速上手:微调和用例脚本](https://github.com/huggingface/transformers/tree/master/examples) | 为各种任务提供的用例脚本 |
|
||||
| [模型分享和上传](https://huggingface.co/transformers/model_sharing.html) | 和社区上传和分享你微调的模型 |
|
||||
| [迁移](https://huggingface.co/transformers/migration.html) | 从 `pytorch-transformers` 或 `pytorch-pretrained-bert` 迁移到 🤗 Transformers |
|
||||
| [模型分享和上传](https://huggingface.co/docs/transformers/model_sharing) | 和社区上传和分享你微调的模型 |
|
||||
| [迁移](https://huggingface.co/docs/transformers/migration) | 从 `pytorch-transformers` 或 `pytorch-pretrained-bert` 迁移到 🤗 Transformers |
|
||||
|
||||
## 引用
|
||||
|
||||
|
||||
@ -63,8 +63,8 @@ user: 使用者
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/LICENSE">
|
||||
<img alt="GitHub" src="https://img.shields.io/github/license/huggingface/transformers.svg?color=blue">
|
||||
</a>
|
||||
<a href="https://huggingface.co/transformers/index.html">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/transformers/index.html.svg?down_color=red&down_message=offline&up_message=online">
|
||||
<a href="https://huggingface.co/docs/transformers/index">
|
||||
<img alt="Documentation" src="https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online">
|
||||
</a>
|
||||
<a href="https://github.com/huggingface/transformers/releases">
|
||||
<img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/transformers.svg">
|
||||
@ -79,7 +79,8 @@ user: 使用者
|
||||
<p>
|
||||
<a href="https://github.com/huggingface/transformers/">English</a> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_zh-hans.md">简体中文</a> |
|
||||
<b>繁體中文</b>
|
||||
<b>繁體中文</b> |
|
||||
<a href="https://github.com/huggingface/transformers/blob/master/README_ko.md">한국어</a>
|
||||
<p>
|
||||
</h4>
|
||||
|
||||
@ -148,7 +149,7 @@ user: 使用者
|
||||
|
||||
```
|
||||
|
||||
除了提供問題解答,預訓練模型還提供了對應的信賴度分數以及解答在 tokenized 後的文本中開始和結束的位置。你可以從[這個教學](https://huggingface.co/transformers/task_summary.html)了解更多 `pipeline` API支援的任務。
|
||||
除了提供問題解答,預訓練模型還提供了對應的信賴度分數以及解答在 tokenized 後的文本中開始和結束的位置。你可以從[這個教學](https://huggingface.co/docs/transformers/task_summary)了解更多 `pipeline` API支援的任務。
|
||||
|
||||
要在你的任務中下載和使用任何預訓練模型很簡單,只需三行程式碼。這裡是 PyTorch 版的範例:
|
||||
```python
|
||||
@ -222,7 +223,7 @@ Tokenizer 為所有的預訓練模型提供了預處理,並可以直接轉換
|
||||
pip install transformers
|
||||
```
|
||||
|
||||
如果你想要試試範例或者想在正式發布前使用最新開發中的程式碼,你必須[從原始碼安裝](https://huggingface.co/transformers/installation.html#installing-from-source)。
|
||||
如果你想要試試範例或者想在正式發布前使用最新開發中的程式碼,你必須[從原始碼安裝](https://huggingface.co/docs/transformers/installation#installing-from-source)。
|
||||
|
||||
### 使用 conda
|
||||
|
||||
@ -242,78 +243,99 @@ conda install -c huggingface transformers
|
||||
|
||||
目前的檢查點數量: 
|
||||
|
||||
🤗 Transformers 目前支援以下的架構(模型概覽請參閱[這裡](https://huggingface.co/transformers/model_summary.html)):
|
||||
🤗 Transformers 目前支援以下的架構(模型概覽請參閱[這裡](https://huggingface.co/docs/transformers/model_summary)):
|
||||
|
||||
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](https://huggingface.co/transformers/model_doc/barthez.html)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/transformers/model_doc/bigbird.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/transformers/model_doc/bigbird_pegasus.html)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](https://huggingface.co/transformers/model_doc/blenderbot_small.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](https://huggingface.co/transformers/model_doc/bort.html)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](https://huggingface.co/transformers/model_doc/byt5.html)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](https://huggingface.co/transformers/model_doc/canine.html)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](https://huggingface.co/transformers/model_doc/clip.html)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](https://huggingface.co/transformers/model_doc/convbert.html)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](https://huggingface.co/transformers/model_doc/cpm.html)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](https://huggingface.co/transformers/model_doc/deberta_v2.html)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](https://huggingface.co/transformers/model_doc/deit.html)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](https://huggingface.co/transformers/model_doc/detr.html)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
|
||||
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
|
||||
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT Neo](https://huggingface.co/transformers/model_doc/gpt_neo.html)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[Hubert](https://huggingface.co/transformers/model_doc/hubert.html)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](https://huggingface.co/transformers/model_doc/ibert.html)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
|
||||
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LED](https://huggingface.co/transformers/model_doc/led.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](https://huggingface.co/transformers/model_doc/luke.html)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](https://huggingface.co/transformers/model_doc/m2m_100.html)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](https://huggingface.co/transformers/model_doc/megatron_bert.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](https://huggingface.co/transformers/model_doc/megatron_gpt2.html)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[MPNet](https://huggingface.co/transformers/model_doc/mpnet.html)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](https://huggingface.co/transformers/model_doc/roformer.html)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/transformers/model_doc/speech_to_text.html)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](https://huggingface.co/transformers/model_doc/tapas.html)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/transformers/model_doc/vit.html)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](https://huggingface.co/transformers/model_doc/visual_bert.html)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](https://huggingface.co/transformers/model_doc/wav2vec2.html)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/transformers/model_doc/xlsr_wav2vec2.html)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
|
||||
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
|
||||
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||
1. **[ImageGPT](https://huggingface.co/docs/transformers/master/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
|
||||
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
|
||||
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
|
||||
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
|
||||
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
|
||||
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
||||
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
||||
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
|
||||
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
||||
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
||||
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
|
||||
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. 想要貢獻新的模型?我們這裡有一份**詳細指引和模板**來引導你加入新的模型。你可以在 [`templates`](./templates) 目錄中找到它們。記得查看[貢獻指引](./CONTRIBUTING.md)並在開始寫 PR 前聯繫維護人員或開一個新的 issue 來獲得 feedbacks。
|
||||
|
||||
要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer,敬請參閱[此表](https://huggingface.co/transformers/index.html#supported-frameworks)。
|
||||
要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer,敬請參閱[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。
|
||||
|
||||
這些實作均已於多個資料集測試(請參閱範例腳本)並應與原版實作表現相當。你可以在範例文件的[此節](https://huggingface.co/transformers/examples.html)中了解實作的細節。
|
||||
這些實作均已於多個資料集測試(請參閱範例腳本)並應與原版實作表現相當。你可以在範例文件的[此節](https://huggingface.co/docs/transformers/examples)中了解實作的細節。
|
||||
|
||||
|
||||
## 了解更多
|
||||
@ -321,12 +343,12 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
| 章節 | 描述 |
|
||||
|-|-|
|
||||
| [文件](https://huggingface.co/transformers/) | 完整的 API 文件和教學 |
|
||||
| [任務概覽](https://huggingface.co/transformers/task_summary.html) | 🤗 Transformers 支援的任務 |
|
||||
| [預處理教學](https://huggingface.co/transformers/preprocessing.html) | 使用 `Tokenizer` 來為模型準備資料 |
|
||||
| [訓練和微調](https://huggingface.co/transformers/training.html) | 使用 PyTorch/TensorFlow 的內建的訓練方式或於 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
|
||||
| [任務概覽](https://huggingface.co/docs/transformers/task_summary) | 🤗 Transformers 支援的任務 |
|
||||
| [預處理教學](https://huggingface.co/docs/transformers/preprocessing) | 使用 `Tokenizer` 來為模型準備資料 |
|
||||
| [訓練和微調](https://huggingface.co/docs/transformers/training) | 使用 PyTorch/TensorFlow 的內建的訓練方式或於 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
|
||||
| [快速上手:微調和範例腳本](https://github.com/huggingface/transformers/tree/master/examples) | 為各種任務提供的範例腳本 |
|
||||
| [模型分享和上傳](https://huggingface.co/transformers/model_sharing.html) | 上傳並與社群分享你微調的模型 |
|
||||
| [遷移](https://huggingface.co/transformers/migration.html) | 從 `pytorch-transformers` 或 `pytorch-pretrained-bert` 遷移到 🤗 Transformers |
|
||||
| [模型分享和上傳](https://huggingface.co/docs/transformers/model_sharing) | 上傳並與社群分享你微調的模型 |
|
||||
| [遷移](https://huggingface.co/docs/transformers/migration) | 從 `pytorch-transformers` 或 `pytorch-pretrained-bert` 遷移到 🤗 Transformers |
|
||||
|
||||
## 引用
|
||||
|
||||
|
||||
@ -166,7 +166,7 @@ Values that should be put in `code` should either be surrounded by double backti
|
||||
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings
|
||||
should usually be put in `code`.
|
||||
|
||||
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
|
||||
When mentioning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
|
||||
linked by Sphinx: :class:\`~transformers.XXXClass\`
|
||||
|
||||
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically
|
||||
|
||||
9
docs/source/_config.py
Normal file
9
docs/source/_config.py
Normal file
@ -0,0 +1,9 @@
|
||||
# docstyle-ignore
|
||||
INSTALL_CONTENT = """
|
||||
# Transformers installation
|
||||
! pip install transformers datasets
|
||||
# To install from source instead of the last release, comment the command above and uncomment the following one.
|
||||
# ! pip install git+https://github.com/huggingface/transformers.git
|
||||
"""
|
||||
|
||||
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
@ -1,16 +0,0 @@
|
||||
|
||||
.highlight .c1, .highlight .sd{
|
||||
color: #999
|
||||
}
|
||||
|
||||
.highlight .nn, .highlight .k, .highlight .s1, .highlight .nb, .highlight .bp, .highlight .kc {
|
||||
color: #FB8D68;
|
||||
}
|
||||
|
||||
.highlight .kn, .highlight .nv, .highlight .s2, .highlight .ow {
|
||||
color: #6670FF;
|
||||
}
|
||||
|
||||
.highlight .gp {
|
||||
color: #FB8D68;
|
||||
}
|
||||
@ -1,350 +0,0 @@
|
||||
/* Our DOM objects */
|
||||
|
||||
/* Colab dropdown */
|
||||
|
||||
table.center-aligned-table td {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
table.center-aligned-table th {
|
||||
text-align: center;
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
.colab-dropdown {
|
||||
position: relative;
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
.colab-dropdown-content {
|
||||
display: none;
|
||||
position: absolute;
|
||||
background-color: #f9f9f9;
|
||||
min-width: 117px;
|
||||
box-shadow: 0px 8px 16px 0px rgba(0,0,0,0.2);
|
||||
z-index: 1;
|
||||
}
|
||||
|
||||
.colab-dropdown-content button {
|
||||
color: #6670FF;
|
||||
background-color: #f9f9f9;
|
||||
font-size: 12px;
|
||||
border: none;
|
||||
min-width: 117px;
|
||||
padding: 5px 5px;
|
||||
text-decoration: none;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.colab-dropdown-content button:hover {background-color: #eee;}
|
||||
|
||||
.colab-dropdown:hover .colab-dropdown-content {display: block;}
|
||||
|
||||
/* Version control */
|
||||
|
||||
.version-button {
|
||||
background-color: #6670FF;
|
||||
color: white;
|
||||
border: none;
|
||||
padding: 5px;
|
||||
font-size: 15px;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.version-button:hover, .version-button:focus {
|
||||
background-color: #A6B0FF;
|
||||
}
|
||||
|
||||
.version-dropdown {
|
||||
display: none;
|
||||
background-color: #6670FF;
|
||||
min-width: 160px;
|
||||
overflow: auto;
|
||||
font-size: 15px;
|
||||
}
|
||||
|
||||
.version-dropdown a {
|
||||
color: white;
|
||||
padding: 3px 4px;
|
||||
text-decoration: none;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.version-dropdown a:hover {
|
||||
background-color: #A6B0FF;
|
||||
}
|
||||
|
||||
.version-show {
|
||||
display: block;
|
||||
}
|
||||
|
||||
/* Framework selector */
|
||||
|
||||
.framework-selector {
|
||||
display: flex;
|
||||
flex-direction: row;
|
||||
justify-content: flex-end;
|
||||
margin-right: 30px;
|
||||
}
|
||||
|
||||
.framework-selector > button {
|
||||
background-color: white;
|
||||
color: #6670FF;
|
||||
border: 1px solid #6670FF;
|
||||
padding: 5px;
|
||||
}
|
||||
|
||||
.framework-selector > button.selected{
|
||||
background-color: #6670FF;
|
||||
color: white;
|
||||
border: 1px solid #6670FF;
|
||||
padding: 5px;
|
||||
}
|
||||
|
||||
/* Copy button */
|
||||
|
||||
a.copybtn {
|
||||
margin: 3px;
|
||||
}
|
||||
|
||||
/* The literal code blocks */
|
||||
.rst-content tt.literal, .rst-content tt.literal, .rst-content code.literal {
|
||||
color: #6670FF;
|
||||
}
|
||||
|
||||
/* To keep the logo centered */
|
||||
.wy-side-scroll {
|
||||
width: auto;
|
||||
font-size: 20px;
|
||||
}
|
||||
|
||||
/* The div that holds the Hugging Face logo */
|
||||
.HuggingFaceDiv {
|
||||
width: 100%
|
||||
}
|
||||
|
||||
/* The research field on top of the toc tree */
|
||||
.wy-side-nav-search{
|
||||
padding-top: 0;
|
||||
background-color: #6670FF;
|
||||
}
|
||||
|
||||
/* The toc tree */
|
||||
.wy-nav-side{
|
||||
background-color: #6670FF;
|
||||
}
|
||||
|
||||
/* The section headers in the toc tree */
|
||||
.wy-menu-vertical p.caption{
|
||||
background-color: #4d59ff;
|
||||
line-height: 40px;
|
||||
}
|
||||
|
||||
/* The selected items in the toc tree */
|
||||
.wy-menu-vertical li.current{
|
||||
background-color: #A6B0FF;
|
||||
}
|
||||
|
||||
/* When a list item that does belong to the selected block from the toc tree is hovered */
|
||||
.wy-menu-vertical li.current a:hover{
|
||||
background-color: #B6C0FF;
|
||||
}
|
||||
|
||||
/* When a list item that does NOT belong to the selected block from the toc tree is hovered. */
|
||||
.wy-menu-vertical li a:hover{
|
||||
background-color: #A7AFFB;
|
||||
}
|
||||
|
||||
/* The text items on the toc tree */
|
||||
.wy-menu-vertical a {
|
||||
color: #FFFFDD;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
}
|
||||
.wy-menu-vertical header, .wy-menu-vertical p.caption{
|
||||
color: white;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
}
|
||||
|
||||
/* The color inside the selected toc tree block */
|
||||
.wy-menu-vertical li.toctree-l2 a, .wy-menu-vertical li.toctree-l3 a, .wy-menu-vertical li.toctree-l4 a {
|
||||
color: black;
|
||||
}
|
||||
|
||||
/* Inside the depth-2 selected toc tree block */
|
||||
.wy-menu-vertical li.toctree-l2.current>a {
|
||||
background-color: #B6C0FF
|
||||
}
|
||||
.wy-menu-vertical li.toctree-l2.current li.toctree-l3>a {
|
||||
background-color: #C6D0FF
|
||||
}
|
||||
|
||||
/* Inside the depth-3 selected toc tree block */
|
||||
.wy-menu-vertical li.toctree-l3.current li.toctree-l4>a{
|
||||
background-color: #D6E0FF
|
||||
}
|
||||
|
||||
/* Inside code snippets */
|
||||
.rst-content dl:not(.docutils) dt{
|
||||
font-size: 15px;
|
||||
}
|
||||
|
||||
/* Links */
|
||||
a {
|
||||
color: #6670FF;
|
||||
}
|
||||
|
||||
/* Content bars */
|
||||
.rst-content dl:not(.docutils) dt {
|
||||
background-color: rgba(251, 141, 104, 0.1);
|
||||
border-right: solid 2px #FB8D68;
|
||||
border-left: solid 2px #FB8D68;
|
||||
color: #FB8D68;
|
||||
font-family: Calibre-Light, sans-serif;
|
||||
border-top: none;
|
||||
font-style: normal !important;
|
||||
}
|
||||
|
||||
/* Expand button */
|
||||
.wy-menu-vertical li.toctree-l2 span.toctree-expand,
|
||||
.wy-menu-vertical li.on a span.toctree-expand, .wy-menu-vertical li.current>a span.toctree-expand,
|
||||
.wy-menu-vertical li.toctree-l3 span.toctree-expand{
|
||||
color: black;
|
||||
}
|
||||
|
||||
/* Max window size */
|
||||
.wy-nav-content{
|
||||
max-width: 1200px;
|
||||
}
|
||||
|
||||
/* Mobile header */
|
||||
.wy-nav-top{
|
||||
background-color: #6670FF;
|
||||
}
|
||||
|
||||
|
||||
/* Source spans */
|
||||
.rst-content .viewcode-link, .rst-content .viewcode-back{
|
||||
color: #6670FF;
|
||||
font-size: 110%;
|
||||
letter-spacing: 2px;
|
||||
text-transform: uppercase;
|
||||
}
|
||||
|
||||
/* It would be better for table to be visible without horizontal scrolling */
|
||||
.wy-table-responsive table td, .wy-table-responsive table th{
|
||||
white-space: normal;
|
||||
}
|
||||
|
||||
.footer {
|
||||
margin-top: 20px;
|
||||
}
|
||||
|
||||
.footer__Social {
|
||||
display: flex;
|
||||
flex-direction: row;
|
||||
}
|
||||
|
||||
.footer__CustomImage {
|
||||
margin: 2px 5px 0 0;
|
||||
}
|
||||
|
||||
/* class and method names in doc */
|
||||
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname, .rst-content dl:not(.docutils) tt.descclassname, .rst-content dl:not(.docutils) code.descclassname{
|
||||
font-family: Calibre, sans-serif;
|
||||
font-size: 20px !important;
|
||||
}
|
||||
|
||||
/* class name in doc*/
|
||||
.rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) tt.descname, .rst-content dl:not(.docutils) code.descname{
|
||||
margin-right: 10px;
|
||||
font-family: Calibre-Medium, sans-serif;
|
||||
}
|
||||
|
||||
/* Method and class parameters */
|
||||
.sig-param{
|
||||
line-height: 23px;
|
||||
}
|
||||
|
||||
/* Class introduction "class" string at beginning */
|
||||
.rst-content dl:not(.docutils) .property{
|
||||
font-size: 18px;
|
||||
color: black;
|
||||
}
|
||||
|
||||
|
||||
/* FONTS */
|
||||
body{
|
||||
font-family: Calibre, sans-serif;
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
h1 {
|
||||
font-family: Calibre-Thin, sans-serif;
|
||||
font-size: 70px;
|
||||
}
|
||||
|
||||
h2, .rst-content .toctree-wrapper p.caption, h3, h4, h5, h6, legend{
|
||||
font-family: Calibre-Medium, sans-serif;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: Calibre-Medium;
|
||||
src: url(./Calibre-Medium.otf);
|
||||
font-weight:400;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: Calibre;
|
||||
src: url(./Calibre-Regular.otf);
|
||||
font-weight:400;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: Calibre-Light;
|
||||
src: url(./Calibre-Light.ttf);
|
||||
font-weight:400;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: Calibre-Thin;
|
||||
src: url(./Calibre-Thin.otf);
|
||||
font-weight:400;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Nav Links to other parts of huggingface.co
|
||||
*/
|
||||
div.menu {
|
||||
position: absolute;
|
||||
top: 0;
|
||||
right: 0;
|
||||
padding-top: 20px;
|
||||
padding-right: 20px;
|
||||
z-index: 1000;
|
||||
}
|
||||
div.menu a {
|
||||
font-size: 14px;
|
||||
letter-spacing: 0.3px;
|
||||
text-transform: uppercase;
|
||||
color: white;
|
||||
-webkit-font-smoothing: antialiased;
|
||||
background: linear-gradient(0deg, #6671ffb8, #9a66ffb8 50%);
|
||||
padding: 10px 16px 6px 16px;
|
||||
border-radius: 3px;
|
||||
margin-left: 12px;
|
||||
position: relative;
|
||||
}
|
||||
div.menu a:active {
|
||||
top: 1px;
|
||||
}
|
||||
@media (min-width: 768px) and (max-width: 1750px) {
|
||||
.wy-breadcrumbs {
|
||||
margin-top: 32px;
|
||||
}
|
||||
}
|
||||
@media (max-width: 768px) {
|
||||
div.menu {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 7.6 KiB |
312
docs/source/_toctree.yml
Normal file
312
docs/source/_toctree.yml
Normal file
@ -0,0 +1,312 @@
|
||||
- sections:
|
||||
- local: index
|
||||
title: 🤗 Transformers
|
||||
- local: quicktour
|
||||
title: Quick tour
|
||||
- local: installation
|
||||
title: Installation
|
||||
- local: philosophy
|
||||
title: Philosophy
|
||||
- local: glossary
|
||||
title: Glossary
|
||||
title: Get started
|
||||
- sections:
|
||||
- local: task_summary
|
||||
title: Summary of the tasks
|
||||
- local: model_summary
|
||||
title: Summary of the models
|
||||
- local: preprocessing
|
||||
title: Preprocessing data
|
||||
- local: training
|
||||
title: Fine-tuning a pretrained model
|
||||
- local: model_sharing
|
||||
title: Model sharing and uploading
|
||||
- local: tokenizer_summary
|
||||
title: Summary of the tokenizers
|
||||
- local: multilingual
|
||||
title: Multi-lingual models
|
||||
title: "Using 🤗 Transformers"
|
||||
- sections:
|
||||
- local: examples
|
||||
title: Examples
|
||||
- local: troubleshooting
|
||||
title: Troubleshooting
|
||||
- local: custom_datasets
|
||||
title: Fine-tuning with custom datasets
|
||||
- local: notebooks
|
||||
title: "🤗 Transformers Notebooks"
|
||||
- local: sagemaker
|
||||
title: Run training on Amazon SageMaker
|
||||
- local: community
|
||||
title: Community
|
||||
- local: converting_tensorflow_models
|
||||
title: Converting Tensorflow Checkpoints
|
||||
- local: migration
|
||||
title: Migrating from previous packages
|
||||
- local: contributing
|
||||
title: How to contribute to transformers?
|
||||
- local: add_new_model
|
||||
title: "How to add a model to 🤗 Transformers?"
|
||||
- local: add_new_pipeline
|
||||
title: "How to add a pipeline to 🤗 Transformers?"
|
||||
- local: fast_tokenizers
|
||||
title: "Using tokenizers from 🤗 Tokenizers"
|
||||
- local: performance
|
||||
title: 'Performance and Scalability: How To Fit a Bigger Model and Train It Faster'
|
||||
- local: parallelism
|
||||
title: Model Parallelism
|
||||
- local: testing
|
||||
title: Testing
|
||||
- local: debugging
|
||||
title: Debugging
|
||||
- local: serialization
|
||||
title: Exporting transformers models
|
||||
- local: pr_checks
|
||||
title: Checks on a Pull Request
|
||||
title: Advanced guides
|
||||
- sections:
|
||||
- local: bertology
|
||||
title: BERTology
|
||||
- local: perplexity
|
||||
title: Perplexity of fixed-length models
|
||||
- local: benchmarks
|
||||
title: Benchmarks
|
||||
title: Research
|
||||
- sections:
|
||||
- sections:
|
||||
- local: main_classes/callback
|
||||
title: Callbacks
|
||||
- local: main_classes/configuration
|
||||
title: Configuration
|
||||
- local: main_classes/data_collator
|
||||
title: Data Collator
|
||||
- local: main_classes/keras_callbacks
|
||||
title: Keras callbacks
|
||||
- local: main_classes/logging
|
||||
title: Logging
|
||||
- local: main_classes/model
|
||||
title: Models
|
||||
- local: main_classes/optimizer_schedules
|
||||
title: Optimization
|
||||
- local: main_classes/output
|
||||
title: Model outputs
|
||||
- local: main_classes/pipelines
|
||||
title: Pipelines
|
||||
- local: main_classes/processors
|
||||
title: Processors
|
||||
- local: main_classes/tokenizer
|
||||
title: Tokenizer
|
||||
- local: main_classes/trainer
|
||||
title: Trainer
|
||||
- local: main_classes/deepspeed
|
||||
title: DeepSpeed Integration
|
||||
- local: main_classes/feature_extractor
|
||||
title: Feature Extractor
|
||||
title: Main Classes
|
||||
- sections:
|
||||
- local: model_doc/albert
|
||||
title: ALBERT
|
||||
- local: model_doc/auto
|
||||
title: Auto Classes
|
||||
- local: model_doc/bart
|
||||
title: BART
|
||||
- local: model_doc/barthez
|
||||
title: BARThez
|
||||
- local: model_doc/bartpho
|
||||
title: BARTpho
|
||||
- local: model_doc/beit
|
||||
title: BEiT
|
||||
- local: model_doc/bert
|
||||
title: BERT
|
||||
- local: model_doc/bertweet
|
||||
title: Bertweet
|
||||
- local: model_doc/bertgeneration
|
||||
title: BertGeneration
|
||||
- local: model_doc/bert_japanese
|
||||
title: BertJapanese
|
||||
- local: model_doc/bigbird
|
||||
title: BigBird
|
||||
- local: model_doc/bigbird_pegasus
|
||||
title: BigBirdPegasus
|
||||
- local: model_doc/blenderbot
|
||||
title: Blenderbot
|
||||
- local: model_doc/blenderbot_small
|
||||
title: Blenderbot Small
|
||||
- local: model_doc/bort
|
||||
title: BORT
|
||||
- local: model_doc/byt5
|
||||
title: ByT5
|
||||
- local: model_doc/camembert
|
||||
title: CamemBERT
|
||||
- local: model_doc/canine
|
||||
title: CANINE
|
||||
- local: model_doc/clip
|
||||
title: CLIP
|
||||
- local: model_doc/convbert
|
||||
title: ConvBERT
|
||||
- local: model_doc/cpm
|
||||
title: CPM
|
||||
- local: model_doc/ctrl
|
||||
title: CTRL
|
||||
- local: model_doc/deberta
|
||||
title: DeBERTa
|
||||
- local: model_doc/deberta_v2
|
||||
title: DeBERTa-v2
|
||||
- local: model_doc/deit
|
||||
title: DeiT
|
||||
- local: model_doc/detr
|
||||
title: DETR
|
||||
- local: model_doc/dialogpt
|
||||
title: DialoGPT
|
||||
- local: model_doc/distilbert
|
||||
title: DistilBERT
|
||||
- local: model_doc/dpr
|
||||
title: DPR
|
||||
- local: model_doc/electra
|
||||
title: ELECTRA
|
||||
- local: model_doc/encoderdecoder
|
||||
title: Encoder Decoder Models
|
||||
- local: model_doc/flaubert
|
||||
title: FlauBERT
|
||||
- local: model_doc/fnet
|
||||
title: FNet
|
||||
- local: model_doc/fsmt
|
||||
title: FSMT
|
||||
- local: model_doc/funnel
|
||||
title: Funnel Transformer
|
||||
- local: model_doc/herbert
|
||||
title: herBERT
|
||||
- local: model_doc/ibert
|
||||
title: I-BERT
|
||||
- local: model_doc/imagegpt
|
||||
title: ImageGPT
|
||||
- local: model_doc/layoutlm
|
||||
title: LayoutLM
|
||||
- local: model_doc/layoutlmv2
|
||||
title: LayoutLMV2
|
||||
- local: model_doc/layoutxlm
|
||||
title: LayoutXLM
|
||||
- local: model_doc/led
|
||||
title: LED
|
||||
- local: model_doc/longformer
|
||||
title: Longformer
|
||||
- local: model_doc/luke
|
||||
title: LUKE
|
||||
- local: model_doc/lxmert
|
||||
title: LXMERT
|
||||
- local: model_doc/marian
|
||||
title: MarianMT
|
||||
- local: model_doc/m2m_100
|
||||
title: M2M100
|
||||
- local: model_doc/mbart
|
||||
title: MBart and MBart-50
|
||||
- local: model_doc/megatron_bert
|
||||
title: MegatronBERT
|
||||
- local: model_doc/megatron_gpt2
|
||||
title: MegatronGPT2
|
||||
- local: model_doc/mobilebert
|
||||
title: MobileBERT
|
||||
- local: model_doc/mluke
|
||||
title: mLUKE
|
||||
- local: model_doc/mpnet
|
||||
title: MPNet
|
||||
- local: model_doc/mt5
|
||||
title: MT5
|
||||
- local: model_doc/gpt
|
||||
title: OpenAI GPT
|
||||
- local: model_doc/gpt2
|
||||
title: OpenAI GPT2
|
||||
- local: model_doc/gptj
|
||||
title: GPT-J
|
||||
- local: model_doc/gpt_neo
|
||||
title: GPT Neo
|
||||
- local: model_doc/hubert
|
||||
title: Hubert
|
||||
- local: model_doc/perceiver
|
||||
title: Perceiver
|
||||
- local: model_doc/pegasus
|
||||
title: Pegasus
|
||||
- local: model_doc/phobert
|
||||
title: PhoBERT
|
||||
- local: model_doc/prophetnet
|
||||
title: ProphetNet
|
||||
- local: model_doc/qdqbert
|
||||
title: QDQBert
|
||||
- local: model_doc/rag
|
||||
title: RAG
|
||||
- local: model_doc/reformer
|
||||
title: Reformer
|
||||
- local: model_doc/rembert
|
||||
title: RemBERT
|
||||
- local: model_doc/retribert
|
||||
title: RetriBERT
|
||||
- local: model_doc/roberta
|
||||
title: RoBERTa
|
||||
- local: model_doc/roformer
|
||||
title: RoFormer
|
||||
- local: model_doc/segformer
|
||||
title: SegFormer
|
||||
- local: model_doc/sew
|
||||
title: SEW
|
||||
- local: model_doc/sew_d
|
||||
title: SEW-D
|
||||
- local: model_doc/speechencoderdecoder
|
||||
title: Speech Encoder Decoder Models
|
||||
- local: model_doc/speech_to_text
|
||||
title: Speech2Text
|
||||
- local: model_doc/speech_to_text_2
|
||||
title: Speech2Text2
|
||||
- local: model_doc/splinter
|
||||
title: Splinter
|
||||
- local: model_doc/squeezebert
|
||||
title: SqueezeBERT
|
||||
- local: model_doc/t5
|
||||
title: T5
|
||||
- local: model_doc/t5v1.1
|
||||
title: T5v1.1
|
||||
- local: model_doc/tapas
|
||||
title: TAPAS
|
||||
- local: model_doc/transformerxl
|
||||
title: Transformer XL
|
||||
- local: model_doc/trocr
|
||||
title: TrOCR
|
||||
- local: model_doc/unispeech
|
||||
title: UniSpeech
|
||||
- local: model_doc/unispeech_sat
|
||||
title: UniSpeech-SAT
|
||||
- local: model_doc/visionencoderdecoder
|
||||
title: Vision Encoder Decoder Models
|
||||
- local: model_doc/vision_text_dual_encoder
|
||||
title: Vision Text Dual Encoder
|
||||
- local: model_doc/vit
|
||||
title: Vision Transformer (ViT)
|
||||
- local: model_doc/visual_bert
|
||||
title: VisualBERT
|
||||
- local: model_doc/wav2vec2
|
||||
title: Wav2Vec2
|
||||
- local: model_doc/xlm
|
||||
title: XLM
|
||||
- local: model_doc/xlmprophetnet
|
||||
title: XLM-ProphetNet
|
||||
- local: model_doc/xlmroberta
|
||||
title: XLM-RoBERTa
|
||||
- local: model_doc/xlnet
|
||||
title: XLNet
|
||||
- local: model_doc/xlsr_wav2vec2
|
||||
title: XLSR-Wav2Vec2
|
||||
title: Models
|
||||
- sections:
|
||||
- local: internal/modeling_utils
|
||||
title: Custom Layers and Utilities
|
||||
- local: internal/pipelines_utils
|
||||
title: Utilities for pipelines
|
||||
- local: internal/tokenization_utils
|
||||
title: Utilities for Tokenizers
|
||||
- local: internal/trainer_utils
|
||||
title: Utilities for Trainer
|
||||
- local: internal/generation_utils
|
||||
title: Utilities for Generation
|
||||
- local: internal/file_utils
|
||||
title: General Utilities
|
||||
title: Internal Helpers
|
||||
title: API
|
||||
@ -72,11 +72,11 @@ call the model to be added to 🤗 Transformers ``BrandNewBert``.
|
||||
|
||||
Let's take a look:
|
||||
|
||||
.. image:: ./imgs/transformers_overview.png
|
||||
.. image:: /imgs/transformers_overview.png
|
||||
|
||||
As you can see, we do make use of inheritance in 🤗 Transformers, but we keep the level of abstraction to an absolute
|
||||
minimum. There are never more than two levels of abstraction for any model in the library. :obj:`BrandNewBertModel`
|
||||
inherits from :obj:`BrandNewBertPreTrainedModel` which in turn inherits from :class:`~transformres.PreTrainedModel` and
|
||||
inherits from :obj:`BrandNewBertPreTrainedModel` which in turn inherits from :class:`~transformers.PreTrainedModel` and
|
||||
that's it. As a general rule, we want to make sure that a new model only depends on
|
||||
:class:`~transformers.PreTrainedModel`. The important functionalities that are automatically provided to every new
|
||||
model are :meth:`~transformers.PreTrainedModel.from_pretrained` and
|
||||
@ -271,7 +271,7 @@ logical components from one another and to have faster debugging cycles as inter
|
||||
notebooks are often easier to share with other contributors, which might be very helpful if you want to ask the Hugging
|
||||
Face team for help. If you are familiar with Jupiter notebooks, we strongly recommend you to work with them.
|
||||
|
||||
The obvious disadvantage of Jupyther notebooks is that if you are not used to working with them you will have to spend
|
||||
The obvious disadvantage of Jupyter notebooks is that if you are not used to working with them you will have to spend
|
||||
some time adjusting to the new programming environment and that you might not be able to use your known debugging tools
|
||||
anymore, like ``ipdb``.
|
||||
|
||||
@ -674,7 +674,7 @@ the ``input_ids`` (usually the word embeddings) are identical. And then work you
|
||||
network. At some point, you will notice a difference between the two implementations, which should point you to the bug
|
||||
in the 🤗 Transformers implementation. From our experience, a simple and efficient way is to add many print statements
|
||||
in both the original implementation and 🤗 Transformers implementation, at the same positions in the network
|
||||
respectively, and to successively remove print statements showing the same values for intermediate presentions.
|
||||
respectively, and to successively remove print statements showing the same values for intermediate presentations.
|
||||
|
||||
When you're confident that both implementations yield the same output, verifying the outputs with
|
||||
``torch.allclose(original_output, output, atol=1e-3)``, you're done with the most difficult part! Congratulations - the
|
||||
|
||||
143
docs/source/add_new_pipeline.rst
Normal file
143
docs/source/add_new_pipeline.rst
Normal file
@ -0,0 +1,143 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
|
||||
How to add a pipeline to 🤗 Transformers?
|
||||
=======================================================================================================================
|
||||
|
||||
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
|
||||
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
|
||||
as it makes compatibility easier (even through other languages via JSON). Those will be the :obj:`inputs` of the
|
||||
pipeline (:obj:`preprocess`).
|
||||
|
||||
Then define the :obj:`outputs`. Same policy as the :obj:`inputs`. The simpler, the better. Those will be the outputs of
|
||||
:obj:`postprocess` method.
|
||||
|
||||
Start by inheriting the base class :obj:`Pipeline`. with the 4 methods needed to implement :obj:`preprocess`,
|
||||
:obj:`_forward`, :obj:`postprocess` and :obj:`_sanitize_parameters`.
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import Pipeline
|
||||
|
||||
class MyPipeline(Pipeline):
|
||||
def _sanitize_parameters(self, **kwargs):
|
||||
preprocess_kwargs = {}
|
||||
if "maybe_arg" in kwargs:
|
||||
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
|
||||
return preprocess_kwargs, {}, {}
|
||||
|
||||
def preprocess(self, inputs, maybe_arg=2):
|
||||
model_input = Tensor(....)
|
||||
return {"model_input": model_input}
|
||||
|
||||
def _forward(self, model_inputs):
|
||||
# model_inputs == {"model_input": model_input}
|
||||
outputs = self.model(**model_inputs)
|
||||
# Maybe {"logits": Tensor(...)}
|
||||
return outputs
|
||||
|
||||
def postprocess(self, model_outputs):
|
||||
best_class = model_outputs["logits"].softmax(-1)
|
||||
return best_class
|
||||
|
||||
|
||||
The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
|
||||
pre/postprocessing on the CPU on different threads
|
||||
|
||||
:obj:`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
|
||||
contain more information and is usually a :obj:`Dict`.
|
||||
|
||||
:obj:`_forward` is the implementation detail and is not meant to be called directly. :obj:`forward` is the preferred
|
||||
called method as it contains safeguards to make sure everything is working on the expected device. If anything is
|
||||
linked to a real model it belongs in the :obj:`_forward` method, anything else is in the preprocess/postprocess.
|
||||
|
||||
:obj:`postprocess` methods will take the output of :obj:`_forward` and turn it into the final output that were decided
|
||||
earlier.
|
||||
|
||||
:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
|
||||
time ``pipeline(...., maybe_arg=4)`` or at call time ``pipe = pipeline(...); output = pipe(...., maybe_arg=4)``.
|
||||
|
||||
The returns of :obj:`_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
|
||||
:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
|
||||
allows to keep the default arguments in the function definition which is always more "natural".
|
||||
|
||||
A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline("my-new-task")
|
||||
>>> pipe("This is a test")
|
||||
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
|
||||
{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
|
||||
|
||||
>>> pipe("This is a test", top_k=2)
|
||||
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
|
||||
|
||||
In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
|
||||
:obj:`_sanitize_parameters` to allow this new parameter.
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
|
||||
def postprocess(self, model_outputs, top_k=5):
|
||||
best_class = model_outputs["logits"].softmax(-1)
|
||||
# Add logic to handle top_k
|
||||
return best_class
|
||||
|
||||
def _sanitize_parameters(self, **kwargs):
|
||||
preprocess_kwargs = {}
|
||||
if "maybe_arg" in kwargs:
|
||||
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
|
||||
|
||||
postprocess_kwargs = {}
|
||||
if "top_k" in kwargs:
|
||||
preprocess_kwargs["top_k"] = kwargs["top_k"]
|
||||
return preprocess_kwargs, {}, postprocess_kwargs
|
||||
|
||||
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
|
||||
without requiring users to understand new kind of objects. It's also relatively common to support many different types
|
||||
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
|
||||
|
||||
|
||||
|
||||
Adding it to the list of supported tasks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Go to ``src/transformers/pipelines/__init__.py`` and fill in :obj:`SUPPORTED_TASKS` with your newly created pipeline.
|
||||
If possible it should provide a default model.
|
||||
|
||||
Adding tests
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Create a new file ``tests/test_pipelines_MY_PIPELINE.py`` with example with the other tests.
|
||||
|
||||
The :obj:`run_pipeline_test` function will be very generic and run on small random models on every possible
|
||||
architecture as defined by :obj:`model_mapping` and :obj:`tf_model_mapping`.
|
||||
|
||||
This is very important to test future compatibility, meaning if someone adds a new model for
|
||||
:obj:`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
|
||||
impossible to check for actual values, that's why There is a helper :obj:`ANY` that will simply attempt to match the
|
||||
output of the pipeline TYPE.
|
||||
|
||||
You also *need* to implement 2 (ideally 4) tests.
|
||||
|
||||
- :obj:`test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
|
||||
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_tf`.
|
||||
- :obj:`test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
|
||||
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_pt`.
|
||||
- :obj:`test_large_model_pt` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
|
||||
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
|
||||
sure there is no drift in future releases
|
||||
- :obj:`test_large_model_tf` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
|
||||
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
|
||||
sure there is no drift in future releases
|
||||
347
docs/source/benchmarks.mdx
Normal file
347
docs/source/benchmarks.mdx
Normal file
@ -0,0 +1,347 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Benchmarks
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
|
||||
|
||||
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found [here](https://github.com/huggingface/transformers/tree/master/notebooks/05-benchmark.ipynb).
|
||||
|
||||
## How to benchmark 🤗 Transformer models
|
||||
|
||||
The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformer models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_.
|
||||
|
||||
<Tip>
|
||||
|
||||
Hereby, _inference_ is defined by a single forward pass, and _training_ is defined by a single forward pass and
|
||||
backward pass.
|
||||
|
||||
</Tip>
|
||||
|
||||
The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an object of type [`PyTorchBenchmarkArguments`] and
|
||||
[`TensorFlowBenchmarkArguments`], respectively, for instantiation. [`PyTorchBenchmarkArguments`] and [`TensorFlowBenchmarkArguments`] are data classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it is shown how a BERT model of type _bert-base-cased_ can be benchmarked.
|
||||
|
||||
```py
|
||||
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
|
||||
|
||||
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> benchmark = PyTorchBenchmark(args)
|
||||
|
||||
===PT-TF-SPLIT===
|
||||
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
|
||||
|
||||
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> benchmark = TensorFlowBenchmark(args)
|
||||
```
|
||||
|
||||
Here, three arguments are given to the benchmark argument data classes, namely `models`, `batch_sizes`, and
|
||||
`sequence_lengths`. The argument `models` is required and expects a `list` of model identifiers from the
|
||||
[model hub](https://huggingface.co/models) The `list` arguments `batch_sizes` and `sequence_lengths` define
|
||||
the size of the `input_ids` on which the model is benchmarked. There are many more parameters that can be configured
|
||||
via the benchmark argument data classes. For more detail on these one can either directly consult the files
|
||||
`src/transformers/benchmark/benchmark_args_utils.py`, `src/transformers/benchmark/benchmark_args.py` (for PyTorch)
|
||||
and `src/transformers/benchmark/benchmark_args_tf.py` (for Tensorflow). Alternatively, running the following shell
|
||||
commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow
|
||||
respectively.
|
||||
|
||||
```bash
|
||||
python examples/pytorch/benchmarking/run_benchmark.py --help
|
||||
|
||||
===PT-TF-SPLIT===
|
||||
python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
|
||||
```
|
||||
|
||||
An instantiated benchmark object can then simply be run by calling `benchmark.run()`.
|
||||
|
||||
```py
|
||||
>>> results = benchmark.run()
|
||||
>>> print(results)
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 0.006
|
||||
bert-base-uncased 8 32 0.006
|
||||
bert-base-uncased 8 128 0.018
|
||||
bert-base-uncased 8 512 0.088
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 1227
|
||||
bert-base-uncased 8 32 1281
|
||||
bert-base-uncased 8 128 1307
|
||||
bert-base-uncased 8 512 1539
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: PyTorch
|
||||
- use_torchscript: False
|
||||
- framework_version: 1.4.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 08:58:43.371351
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
===PT-TF-SPLIT===
|
||||
>>> results = benchmark.run()
|
||||
>>> print(results)
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 0.005
|
||||
bert-base-uncased 8 32 0.008
|
||||
bert-base-uncased 8 128 0.022
|
||||
bert-base-uncased 8 512 0.105
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 1330
|
||||
bert-base-uncased 8 32 1330
|
||||
bert-base-uncased 8 128 1330
|
||||
bert-base-uncased 8 512 1770
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: Tensorflow
|
||||
- use_xla: False
|
||||
- framework_version: 2.2.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:26:35.617317
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
```
|
||||
|
||||
By default, the _time_ and the _required memory_ for _inference_ are benchmarked. In the example output above the first
|
||||
two sections show the result corresponding to _inference time_ and _inference memory_. In addition, all relevant
|
||||
information about the computing environment, _e.g._ the GPU type, the system, the library versions, etc... are printed
|
||||
out in the third section under _ENVIRONMENT INFORMATION_. This information can optionally be saved in a _.csv_ file
|
||||
when adding the argument `save_to_csv=True` to [`PyTorchBenchmarkArguments`] and
|
||||
[`TensorFlowBenchmarkArguments`] respectively. In this case, every section is saved in a separate
|
||||
_.csv_ file. The path to each _.csv_ file can optionally be defined via the argument data classes.
|
||||
|
||||
Instead of benchmarking pre-trained models via their model identifier, _e.g._ `bert-base-uncased`, the user can
|
||||
alternatively benchmark an arbitrary configuration of any available model class. In this case, a `list` of
|
||||
configurations must be inserted with the benchmark args as follows.
|
||||
|
||||
```py
|
||||
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
|
||||
|
||||
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> config_base = BertConfig()
|
||||
>>> config_384_hid = BertConfig(hidden_size=384)
|
||||
>>> config_6_lay = BertConfig(num_hidden_layers=6)
|
||||
|
||||
>>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
|
||||
>>> benchmark.run()
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 128 0.006
|
||||
bert-base 8 512 0.006
|
||||
bert-base 8 128 0.018
|
||||
bert-base 8 512 0.088
|
||||
bert-384-hid 8 8 0.006
|
||||
bert-384-hid 8 32 0.006
|
||||
bert-384-hid 8 128 0.011
|
||||
bert-384-hid 8 512 0.054
|
||||
bert-6-lay 8 8 0.003
|
||||
bert-6-lay 8 32 0.004
|
||||
bert-6-lay 8 128 0.009
|
||||
bert-6-lay 8 512 0.044
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 1277
|
||||
bert-base 8 32 1281
|
||||
bert-base 8 128 1307
|
||||
bert-base 8 512 1539
|
||||
bert-384-hid 8 8 1005
|
||||
bert-384-hid 8 32 1027
|
||||
bert-384-hid 8 128 1035
|
||||
bert-384-hid 8 512 1255
|
||||
bert-6-lay 8 8 1097
|
||||
bert-6-lay 8 32 1101
|
||||
bert-6-lay 8 128 1127
|
||||
bert-6-lay 8 512 1359
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: PyTorch
|
||||
- use_torchscript: False
|
||||
- framework_version: 1.4.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:35:25.143267
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
===PT-TF-SPLIT===
|
||||
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
|
||||
|
||||
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> config_base = BertConfig()
|
||||
>>> config_384_hid = BertConfig(hidden_size=384)
|
||||
>>> config_6_lay = BertConfig(num_hidden_layers=6)
|
||||
|
||||
>>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
|
||||
>>> benchmark.run()
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 0.005
|
||||
bert-base 8 32 0.008
|
||||
bert-base 8 128 0.022
|
||||
bert-base 8 512 0.106
|
||||
bert-384-hid 8 8 0.005
|
||||
bert-384-hid 8 32 0.007
|
||||
bert-384-hid 8 128 0.018
|
||||
bert-384-hid 8 512 0.064
|
||||
bert-6-lay 8 8 0.002
|
||||
bert-6-lay 8 32 0.003
|
||||
bert-6-lay 8 128 0.0011
|
||||
bert-6-lay 8 512 0.074
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 1330
|
||||
bert-base 8 32 1330
|
||||
bert-base 8 128 1330
|
||||
bert-base 8 512 1770
|
||||
bert-384-hid 8 8 1330
|
||||
bert-384-hid 8 32 1330
|
||||
bert-384-hid 8 128 1330
|
||||
bert-384-hid 8 512 1540
|
||||
bert-6-lay 8 8 1330
|
||||
bert-6-lay 8 32 1330
|
||||
bert-6-lay 8 128 1330
|
||||
bert-6-lay 8 512 1540
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: Tensorflow
|
||||
- use_xla: False
|
||||
- framework_version: 2.2.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:38:15.487125
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
```
|
||||
|
||||
Again, _inference time_ and _required memory_ for _inference_ are measured, but this time for customized configurations
|
||||
of the `BertModel` class. This feature can especially be helpful when deciding for which configuration the model
|
||||
should be trained.
|
||||
|
||||
|
||||
## Benchmark best practices
|
||||
|
||||
This section lists a couple of best practices one should be aware of when benchmarking a model.
|
||||
|
||||
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
|
||||
specifies on which device the code should be run by setting the `CUDA_VISIBLE_DEVICES` environment variable in the
|
||||
shell, _e.g._ `export CUDA_VISIBLE_DEVICES=0` before running the code.
|
||||
- The option `no_multi_processing` should only be set to `True` for testing and debugging. To ensure accurate
|
||||
memory measurement it is recommended to run each memory benchmark in a separate process by making sure
|
||||
`no_multi_processing` is set to `True`.
|
||||
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
|
||||
heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
|
||||
useful for the community.
|
||||
|
||||
|
||||
## Sharing your benchmark
|
||||
|
||||
Previously all available core models (10 at the time) have been benchmarked for _inference time_, across many different
|
||||
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
|
||||
done across CPUs (except for TensorFlow XLA) and GPUs.
|
||||
|
||||
The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) and the results are
|
||||
available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
|
||||
|
||||
With the new _benchmark_ tools, it is easier than ever to share your benchmark results with the community
|
||||
|
||||
- [PyTorch Benchmarking Results](https://github.com/huggingface/transformers/tree/master/examples/pytorch/benchmarking/README.md).
|
||||
- [TensorFlow Benchmarking Results](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/benchmarking/README.md).
|
||||
@ -1,363 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Benchmarks
|
||||
=======================================================================================================================
|
||||
|
||||
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks.
|
||||
|
||||
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found :prefix_link:`here
|
||||
<notebooks/05-benchmark.ipynb>`.
|
||||
|
||||
How to benchmark 🤗 Transformer models
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` allow to flexibly
|
||||
benchmark 🤗 Transformer models. The benchmark classes allow us to measure the `peak memory usage` and `required time`
|
||||
for both `inference` and `training`.
|
||||
|
||||
.. note::
|
||||
|
||||
Hereby, `inference` is defined by a single forward pass, and `training` is defined by a single forward pass and
|
||||
backward pass.
|
||||
|
||||
The benchmark classes :class:`~transformers.PyTorchBenchmark` and :class:`~transformers.TensorFlowBenchmark` expect an
|
||||
object of type :class:`~transformers.PyTorchBenchmarkArguments` and
|
||||
:class:`~transformers.TensorFlowBenchmarkArguments`, respectively, for instantiation.
|
||||
:class:`~transformers.PyTorchBenchmarkArguments` and :class:`~transformers.TensorFlowBenchmarkArguments` are data
|
||||
classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it
|
||||
is shown how a BERT model of type `bert-base-cased` can be benchmarked.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> ## PYTORCH CODE
|
||||
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
|
||||
|
||||
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> benchmark = PyTorchBenchmark(args)
|
||||
|
||||
>>> ## TENSORFLOW CODE
|
||||
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
|
||||
|
||||
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> benchmark = TensorFlowBenchmark(args)
|
||||
|
||||
|
||||
Here, three arguments are given to the benchmark argument data classes, namely ``models``, ``batch_sizes``, and
|
||||
``sequence_lengths``. The argument ``models`` is required and expects a :obj:`list` of model identifiers from the
|
||||
`model hub <https://huggingface.co/models>`__ The :obj:`list` arguments ``batch_sizes`` and ``sequence_lengths`` define
|
||||
the size of the ``input_ids`` on which the model is benchmarked. There are many more parameters that can be configured
|
||||
via the benchmark argument data classes. For more detail on these one can either directly consult the files
|
||||
``src/transformers/benchmark/benchmark_args_utils.py``, ``src/transformers/benchmark/benchmark_args.py`` (for PyTorch)
|
||||
and ``src/transformers/benchmark/benchmark_args_tf.py`` (for Tensorflow). Alternatively, running the following shell
|
||||
commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow
|
||||
respectively.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
## PYTORCH CODE
|
||||
python examples/pytorch/benchmarking/run_benchmark.py --help
|
||||
|
||||
## TENSORFLOW CODE
|
||||
python examples/tensorflow/benchmarking/run_benchmark_tf.py --help
|
||||
|
||||
|
||||
An instantiated benchmark object can then simply be run by calling ``benchmark.run()``.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> ## PYTORCH CODE
|
||||
>>> results = benchmark.run()
|
||||
>>> print(results)
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 0.006
|
||||
bert-base-uncased 8 32 0.006
|
||||
bert-base-uncased 8 128 0.018
|
||||
bert-base-uncased 8 512 0.088
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 1227
|
||||
bert-base-uncased 8 32 1281
|
||||
bert-base-uncased 8 128 1307
|
||||
bert-base-uncased 8 512 1539
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: PyTorch
|
||||
- use_torchscript: False
|
||||
- framework_version: 1.4.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 08:58:43.371351
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
>>> ## TENSORFLOW CODE
|
||||
>>> results = benchmark.run()
|
||||
>>> print(results)
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 0.005
|
||||
bert-base-uncased 8 32 0.008
|
||||
bert-base-uncased 8 128 0.022
|
||||
bert-base-uncased 8 512 0.105
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base-uncased 8 8 1330
|
||||
bert-base-uncased 8 32 1330
|
||||
bert-base-uncased 8 128 1330
|
||||
bert-base-uncased 8 512 1770
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: Tensorflow
|
||||
- use_xla: False
|
||||
- framework_version: 2.2.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:26:35.617317
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
By default, the `time` and the `required memory` for `inference` are benchmarked. In the example output above the first
|
||||
two sections show the result corresponding to `inference time` and `inference memory`. In addition, all relevant
|
||||
information about the computing environment, `e.g.` the GPU type, the system, the library versions, etc... are printed
|
||||
out in the third section under `ENVIRONMENT INFORMATION`. This information can optionally be saved in a `.csv` file
|
||||
when adding the argument :obj:`save_to_csv=True` to :class:`~transformers.PyTorchBenchmarkArguments` and
|
||||
:class:`~transformers.TensorFlowBenchmarkArguments` respectively. In this case, every section is saved in a separate
|
||||
`.csv` file. The path to each `.csv` file can optionally be defined via the argument data classes.
|
||||
|
||||
Instead of benchmarking pre-trained models via their model identifier, `e.g.` `bert-base-uncased`, the user can
|
||||
alternatively benchmark an arbitrary configuration of any available model class. In this case, a :obj:`list` of
|
||||
configurations must be inserted with the benchmark args as follows.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> ## PYTORCH CODE
|
||||
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
|
||||
|
||||
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> config_base = BertConfig()
|
||||
>>> config_384_hid = BertConfig(hidden_size=384)
|
||||
>>> config_6_lay = BertConfig(num_hidden_layers=6)
|
||||
|
||||
>>> benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
|
||||
>>> benchmark.run()
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 128 0.006
|
||||
bert-base 8 512 0.006
|
||||
bert-base 8 128 0.018
|
||||
bert-base 8 512 0.088
|
||||
bert-384-hid 8 8 0.006
|
||||
bert-384-hid 8 32 0.006
|
||||
bert-384-hid 8 128 0.011
|
||||
bert-384-hid 8 512 0.054
|
||||
bert-6-lay 8 8 0.003
|
||||
bert-6-lay 8 32 0.004
|
||||
bert-6-lay 8 128 0.009
|
||||
bert-6-lay 8 512 0.044
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 1277
|
||||
bert-base 8 32 1281
|
||||
bert-base 8 128 1307
|
||||
bert-base 8 512 1539
|
||||
bert-384-hid 8 8 1005
|
||||
bert-384-hid 8 32 1027
|
||||
bert-384-hid 8 128 1035
|
||||
bert-384-hid 8 512 1255
|
||||
bert-6-lay 8 8 1097
|
||||
bert-6-lay 8 32 1101
|
||||
bert-6-lay 8 128 1127
|
||||
bert-6-lay 8 512 1359
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: PyTorch
|
||||
- use_torchscript: False
|
||||
- framework_version: 1.4.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:35:25.143267
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
>>> ## TENSORFLOW CODE
|
||||
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
|
||||
|
||||
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
|
||||
>>> config_base = BertConfig()
|
||||
>>> config_384_hid = BertConfig(hidden_size=384)
|
||||
>>> config_6_lay = BertConfig(num_hidden_layers=6)
|
||||
|
||||
>>> benchmark = TensorFlowBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])
|
||||
>>> benchmark.run()
|
||||
==================== INFERENCE - SPEED - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Time in s
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 0.005
|
||||
bert-base 8 32 0.008
|
||||
bert-base 8 128 0.022
|
||||
bert-base 8 512 0.106
|
||||
bert-384-hid 8 8 0.005
|
||||
bert-384-hid 8 32 0.007
|
||||
bert-384-hid 8 128 0.018
|
||||
bert-384-hid 8 512 0.064
|
||||
bert-6-lay 8 8 0.002
|
||||
bert-6-lay 8 32 0.003
|
||||
bert-6-lay 8 128 0.0011
|
||||
bert-6-lay 8 512 0.074
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== INFERENCE - MEMORY - RESULT ====================
|
||||
--------------------------------------------------------------------------------
|
||||
Model Name Batch Size Seq Length Memory in MB
|
||||
--------------------------------------------------------------------------------
|
||||
bert-base 8 8 1330
|
||||
bert-base 8 32 1330
|
||||
bert-base 8 128 1330
|
||||
bert-base 8 512 1770
|
||||
bert-384-hid 8 8 1330
|
||||
bert-384-hid 8 32 1330
|
||||
bert-384-hid 8 128 1330
|
||||
bert-384-hid 8 512 1540
|
||||
bert-6-lay 8 8 1330
|
||||
bert-6-lay 8 32 1330
|
||||
bert-6-lay 8 128 1330
|
||||
bert-6-lay 8 512 1540
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
==================== ENVIRONMENT INFORMATION ====================
|
||||
|
||||
- transformers_version: 2.11.0
|
||||
- framework: Tensorflow
|
||||
- use_xla: False
|
||||
- framework_version: 2.2.0
|
||||
- python_version: 3.6.10
|
||||
- system: Linux
|
||||
- cpu: x86_64
|
||||
- architecture: 64bit
|
||||
- date: 2020-06-29
|
||||
- time: 09:38:15.487125
|
||||
- fp16: False
|
||||
- use_multiprocessing: True
|
||||
- only_pretrain_model: False
|
||||
- cpu_ram_mb: 32088
|
||||
- use_gpu: True
|
||||
- num_gpus: 1
|
||||
- gpu: TITAN RTX
|
||||
- gpu_ram_mb: 24217
|
||||
- gpu_power_watts: 280.0
|
||||
- gpu_performance_state: 2
|
||||
- use_tpu: False
|
||||
|
||||
|
||||
Again, `inference time` and `required memory` for `inference` are measured, but this time for customized configurations
|
||||
of the :obj:`BertModel` class. This feature can especially be helpful when deciding for which configuration the model
|
||||
should be trained.
|
||||
|
||||
|
||||
Benchmark best practices
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This section lists a couple of best practices one should be aware of when benchmarking a model.
|
||||
|
||||
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
|
||||
specifies on which device the code should be run by setting the ``CUDA_VISIBLE_DEVICES`` environment variable in the
|
||||
shell, `e.g.` ``export CUDA_VISIBLE_DEVICES=0`` before running the code.
|
||||
- The option :obj:`no_multi_processing` should only be set to :obj:`True` for testing and debugging. To ensure accurate
|
||||
memory measurement it is recommended to run each memory benchmark in a separate process by making sure
|
||||
:obj:`no_multi_processing` is set to :obj:`True`.
|
||||
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
|
||||
heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
|
||||
useful for the community.
|
||||
|
||||
|
||||
Sharing your benchmark
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Previously all available core models (10 at the time) have been benchmarked for `inference time`, across many different
|
||||
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
|
||||
done across CPUs (except for TensorFlow XLA) and GPUs.
|
||||
|
||||
The approach is detailed in the `following blogpost
|
||||
<https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2>`__ and the results are
|
||||
available `here
|
||||
<https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing>`__.
|
||||
|
||||
With the new `benchmark` tools, it is easier than ever to share your benchmark results with the community
|
||||
|
||||
- :prefix_link:`PyTorch Benchmarking Results<examples/pytorch/benchmarking/README.md>`.
|
||||
- :prefix_link:`TensorFlow Benchmarking Results<examples/tensorflow/benchmarking/README.md>`.
|
||||
@ -1,4 +1,4 @@
|
||||
# Community
|
||||
# Community
|
||||
|
||||
This page regroups resources around 🤗 Transformers developed by the community.
|
||||
|
||||
@ -6,12 +6,13 @@ This page regroups resources around 🤗 Transformers developed by the community
|
||||
|
||||
| Resource | Description | Author |
|
||||
|:----------|:-------------|------:|
|
||||
| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](https://huggingface.co/transformers/master/glossary.html) that has been put into a form which can be easily learnt/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) |
|
||||
| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learnt/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) |
|
||||
|
||||
## Community notebooks:
|
||||
|
||||
| Notebook | Description | Author | |
|
||||
|:----------|:-------------|:-------------|------:|
|
||||
| [Fine-tune a pre-trained Transformer to generate lyrics](https://github.com/AlekseyKorshuk/huggingartists) | How to generate lyrics in the style of your favorite artist by fine-tuning a GPT-2 model | [Aleksey Korshuk](https://github.com/AlekseyKorshuk) | [](https://colab.research.google.com/github/AlekseyKorshuk/huggingartists/blob/master/huggingartists-demo.ipynb) |
|
||||
| [Train T5 in Tensorflow 2 ](https://github.com/snapthat/TF-T5-text-to-text) | How to train T5 for any task using Tensorflow 2. This notebook demonstrates a Question & Answer task implemented in Tensorflow 2 using SQUAD | [Muhammad Harris](https://github.com/HarrisDePerceptron) |[](https://colab.research.google.com/github/snapthat/TF-T5-text-to-text/blob/master/snapthatT5/notebooks/TF-T5-Datasets%20Training.ipynb) |
|
||||
| [Train T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) | How to train T5 on SQUAD with Transformers and Nlp | [Suraj Patil](https://github.com/patil-suraj) |[](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb#scrollTo=QLGiFCDqvuil) |
|
||||
| [Fine-tune T5 for Classification and Multiple Choice](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | How to fine-tune T5 for classification and multiple choice tasks using a text-to-text format with PyTorch Lightning | [Suraj Patil](https://github.com/patil-suraj) | [](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) |
|
||||
@ -35,7 +36,7 @@ This page regroups resources around 🤗 Transformers developed by the community
|
||||
|[fine-tune a non-English GPT-2 Model with Trainer class](https://github.com/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb) | How to fine-tune a non-English GPT-2 Model with Trainer class | [Philipp Schmid](https://www.philschmid.de) | [](https://colab.research.google.com/github/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb)|
|
||||
|[Fine-tune a DistilBERT Model for Multi Label Classification task](https://github.com/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb) | How to fine-tune a DistilBERT Model for Multi Label Classification task | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb)|
|
||||
|[Fine-tune ALBERT for sentence-pair classification](https://github.com/NadirEM/nlp-notebooks/blob/master/Fine_tune_ALBERT_sentence_pair_classification.ipynb) | How to fine-tune an ALBERT model or another BERT-based model for the sentence-pair classification task | [Nadir El Manouzi](https://github.com/NadirEM) | [](https://colab.research.google.com/github/NadirEM/nlp-notebooks/blob/master/Fine_tune_ALBERT_sentence_pair_classification.ipynb)|
|
||||
|[Fine-tune Roberta for sentiment analysis](https://github.com/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb) | How to fine-tune an Roberta model for sentiment analysis | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)|
|
||||
|[Fine-tune Roberta for sentiment analysis](https://github.com/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb) | How to fine-tune a Roberta model for sentiment analysis | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)|
|
||||
|[Evaluating Question Generation Models](https://github.com/flexudy-pipe/qugeev) | How accurate are the answers to questions generated by your seq2seq transformer model? | [Pascal Zoleko](https://github.com/zolekode) | [](https://colab.research.google.com/drive/1bpsSqCQU-iw_5nNoRm_crPq6FRuJthq_?usp=sharing)|
|
||||
|[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)|
|
||||
|[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
|
||||
|
||||
@ -1,218 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file does only contain a selection of the most common options. For a
|
||||
# full list see the documentation:
|
||||
# http://www.sphinx-doc.org/en/master/config
|
||||
|
||||
# -- Path setup --------------------------------------------------------------
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.abspath("../../src"))
|
||||
|
||||
|
||||
# -- Project information -----------------------------------------------------
|
||||
|
||||
project = "transformers"
|
||||
copyright = "2020, The Hugging Face Team, Licenced under the Apache License, Version 2.0"
|
||||
author = "huggingface"
|
||||
|
||||
# The short X.Y version
|
||||
version = ""
|
||||
# The full version, including alpha/beta/rc tags
|
||||
release = u'4.7.0'
|
||||
|
||||
|
||||
|
||||
# Prefix link to point to master, comment this during version release and uncomment below line
|
||||
extlinks = {"prefix_link": ("https://github.com/huggingface/transformers/blob/master/%s", "")}
|
||||
# Prefix link to always point to corresponding version, uncomment this during version release
|
||||
# extlinks = {'prefix_link': ('https://github.com/huggingface/transformers/blob/v'+ release + '/%s', '')}
|
||||
|
||||
# -- General configuration ---------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#
|
||||
# needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be
|
||||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
|
||||
# ones.
|
||||
extensions = [
|
||||
"sphinx.ext.autodoc",
|
||||
"sphinx.ext.extlinks",
|
||||
"sphinx.ext.coverage",
|
||||
"sphinx.ext.napoleon",
|
||||
"recommonmark",
|
||||
"sphinx.ext.viewcode",
|
||||
"sphinx_markdown_tables",
|
||||
"sphinxext.opengraph",
|
||||
"sphinx_copybutton",
|
||||
]
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ["_templates"]
|
||||
|
||||
# The suffix(es) of source filenames.
|
||||
# You can specify multiple suffix as a list of string:
|
||||
#
|
||||
source_suffix = [".rst", ".md"]
|
||||
# source_suffix = '.rst'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = "index"
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#
|
||||
# This is also used if you do content translation via gettext catalogs.
|
||||
# Usually you set "language" from the command line for these cases.
|
||||
language = None
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
# This pattern also affects html_static_path and html_extra_path.
|
||||
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = None
|
||||
|
||||
# Remove the prompt when copying examples
|
||||
copybutton_prompt_text = r">>> |\.\.\. "
|
||||
copybutton_prompt_is_regexp = True
|
||||
|
||||
# -- Options for HTML output -------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
#
|
||||
html_theme = "sphinx_rtd_theme"
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#
|
||||
html_theme_options = {"analytics_id": "UA-83738774-2", "navigation_with_keys": True}
|
||||
|
||||
# Configuration for OpenGraph and Twitter Card Tags.
|
||||
# These are responsible for creating nice shareable social images https://ahrefs.com/blog/open-graph-meta-tags/
|
||||
# https://ogp.me/#type_website
|
||||
ogp_image = "https://huggingface.co/front/thumbnails/transformers.png"
|
||||
ogp_description = "State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0. Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone"
|
||||
ogp_description_length = 160
|
||||
|
||||
ogp_custom_meta_tags = [
|
||||
f'<meta name="twitter:image" content="{ogp_image}">',
|
||||
f'<meta name="twitter:description" content="{ogp_description}">',
|
||||
]
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ["_static"]
|
||||
|
||||
# Custom sidebar templates, must be a dictionary that maps document names
|
||||
# to template names.
|
||||
#
|
||||
# The default sidebars (for documents that don't match any pattern) are
|
||||
# defined by theme itself. Builtin themes are using these templates by
|
||||
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
|
||||
# 'searchbox.html']``.
|
||||
#
|
||||
# html_sidebars = {}
|
||||
|
||||
# This must be the name of an image file (path relative to the configuration
|
||||
# directory) that is the favicon of the docs. Modern browsers use this as
|
||||
# the icon for tabs, windows and bookmarks. It should be a Windows-style
|
||||
# icon file (.ico).
|
||||
html_favicon = "favicon.ico"
|
||||
|
||||
|
||||
# -- Options for HTMLHelp output ---------------------------------------------
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = "transformersdoc"
|
||||
|
||||
|
||||
# -- Options for LaTeX output ------------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#
|
||||
# 'papersize': 'letterpaper',
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#
|
||||
# 'pointsize': '10pt',
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#
|
||||
# 'preamble': '',
|
||||
# Latex figure (float) alignment
|
||||
#
|
||||
# 'figure_align': 'htbp',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title,
|
||||
# author, documentclass [howto, manual, or own class]).
|
||||
latex_documents = [
|
||||
(master_doc, "transformers.tex", "transformers Documentation", "huggingface", "manual"),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for manual page output ------------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [(master_doc, "transformers", "transformers Documentation", [author], 1)]
|
||||
|
||||
|
||||
# -- Options for Texinfo output ----------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
(
|
||||
master_doc,
|
||||
"transformers",
|
||||
"transformers Documentation",
|
||||
author,
|
||||
"transformers",
|
||||
"One line description of project.",
|
||||
"Miscellaneous",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
# -- Options for Epub output -------------------------------------------------
|
||||
|
||||
# Bibliographic Dublin Core info.
|
||||
epub_title = project
|
||||
|
||||
# The unique identifier of the text. This can be a ISBN number
|
||||
# or the project homepage.
|
||||
#
|
||||
# epub_identifier = ''
|
||||
|
||||
# A unique identification for the text.
|
||||
#
|
||||
# epub_uid = ''
|
||||
|
||||
# A list of files that should not be packed into the epub file.
|
||||
epub_exclude_files = ["search.html"]
|
||||
|
||||
|
||||
def setup(app):
|
||||
app.add_css_file("css/huggingface.css")
|
||||
app.add_css_file("css/code-snippets.css")
|
||||
app.add_js_file("js/custom.js")
|
||||
|
||||
|
||||
# -- Extension configuration -------------------------------------------------
|
||||
@ -13,8 +13,8 @@
|
||||
Converting Tensorflow Checkpoints
|
||||
=======================================================================================================================
|
||||
|
||||
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints in models
|
||||
than be loaded using the ``from_pretrained`` methods of the library.
|
||||
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
|
||||
that can be loaded using the ``from_pretrained`` methods of the library.
|
||||
|
||||
.. note::
|
||||
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
|
||||
@ -26,22 +26,22 @@ BERT
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
|
||||
<https://github.com/google-research/bert#pre-trained-models>`_\ ) in a PyTorch save file by using the
|
||||
<https://github.com/google-research/bert#pre-trained-models>`_) in a PyTorch save file by using the
|
||||
:prefix_link:`convert_bert_original_tf_checkpoint_to_pytorch.py
|
||||
<src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
|
||||
|
||||
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``\ ) and the associated
|
||||
configuration file (\ ``bert_config.json``\ ), and creates a PyTorch model for this configuration, loads the weights
|
||||
from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that
|
||||
can be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
|
||||
<examples/pytorch/text-classification/run_glue.py>` \ ).
|
||||
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``) and the associated
|
||||
configuration file (``bert_config.json``), and creates a PyTorch model for this configuration, loads the weights from
|
||||
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
|
||||
be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
|
||||
<examples/pytorch/text-classification/run_glue.py>` ).
|
||||
|
||||
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
|
||||
checkpoint (the three files starting with ``bert_model.ckpt``\ ) but be sure to keep the configuration file (\
|
||||
``bert_config.json``\ ) and the vocabulary file (\ ``vocab.txt``\ ) as these are needed for the PyTorch model too.
|
||||
checkpoint (the three files starting with ``bert_model.ckpt``) but be sure to keep the configuration file (\
|
||||
``bert_config.json``) and the vocabulary file (``vocab.txt``) as these are needed for the PyTorch model too.
|
||||
|
||||
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (\ ``pip install
|
||||
tensorflow``\ ). The rest of the repository only requires PyTorch.
|
||||
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (``pip install
|
||||
tensorflow``). The rest of the repository only requires PyTorch.
|
||||
|
||||
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
|
||||
|
||||
@ -64,9 +64,9 @@ Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
|
||||
:prefix_link:`convert_albert_original_tf_checkpoint_to_pytorch.py
|
||||
<src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py>` script.
|
||||
|
||||
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``\ ) and the accompanying
|
||||
configuration file (\ ``albert_config.json``\ ), then creates and saves a PyTorch model. To run this conversion you
|
||||
will need to have TensorFlow and PyTorch installed.
|
||||
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``) and the accompanying
|
||||
configuration file (``albert_config.json``), then creates and saves a PyTorch model. To run this conversion you will
|
||||
need to have TensorFlow and PyTorch installed.
|
||||
|
||||
Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
|
||||
|
||||
@ -104,7 +104,7 @@ OpenAI GPT-2
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here
|
||||
<https://github.com/openai/gpt-2>`__\ )
|
||||
<https://github.com/openai/gpt-2>`__)
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
@ -120,7 +120,7 @@ Transformer-XL
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
|
||||
<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__\ )
|
||||
<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__)
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
|
||||
681
docs/source/custom_datasets.mdx
Normal file
681
docs/source/custom_datasets.mdx
Normal file
@ -0,0 +1,681 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# How to fine-tune a model for common downstream tasks
|
||||
|
||||
[[open-in-colab]]
|
||||
|
||||
This guide will show you how to fine-tune 🤗 Transformers models for common downstream tasks. You will use the 🤗
|
||||
Datasets library to quickly load and preprocess the datasets, getting them ready for training with PyTorch and
|
||||
TensorFlow.
|
||||
|
||||
Before you begin, make sure you have the 🤗 Datasets library installed. For more detailed installation instructions,
|
||||
refer to the 🤗 Datasets [installation page](https://huggingface.co/docs/datasets/installation.html). All of the
|
||||
examples in this guide will use 🤗 Datasets to load and preprocess a dataset.
|
||||
|
||||
```bash
|
||||
pip install datasets
|
||||
```
|
||||
|
||||
Learn how to fine-tune a model for:
|
||||
|
||||
- [seq_imdb](#seq_imdb)
|
||||
- [tok_ner](#tok_ner)
|
||||
- [qa_squad](#qa_squad)
|
||||
|
||||
<a id='seq_imdb'></a>
|
||||
|
||||
## Sequence classification with IMDb reviews
|
||||
|
||||
Sequence classification refers to the task of classifying sequences of text according to a given number of classes. In
|
||||
this example, learn how to fine-tune a model on the [IMDb dataset](https://huggingface.co/datasets/imdb) to determine
|
||||
whether a review is positive or negative.
|
||||
|
||||
<Tip>
|
||||
|
||||
For a more in-depth example of how to fine-tune a model for text classification, take a look at the corresponding
|
||||
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb)
|
||||
or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification-tf.ipynb).
|
||||
|
||||
</Tip>
|
||||
|
||||
### Load IMDb dataset
|
||||
|
||||
The 🤗 Datasets library makes it simple to load a dataset:
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
imdb = load_dataset("imdb")
|
||||
```
|
||||
|
||||
This loads a `DatasetDict` object which you can index into to view an example:
|
||||
|
||||
```python
|
||||
imdb["train"][0]
|
||||
{'label': 1,
|
||||
'text': 'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!'
|
||||
}
|
||||
```
|
||||
|
||||
### Preprocess
|
||||
|
||||
The next step is to tokenize the text into a readable format by the model. It is important to load the same tokenizer a
|
||||
model was trained with to ensure appropriately tokenized words. Load the DistilBERT tokenizer with the
|
||||
[`AutoTokenizer`] because we will eventually train a classifier using a pretrained [DistilBERT](https://huggingface.co/distilbert-base-uncased) model:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
|
||||
```
|
||||
|
||||
Now that you have instantiated a tokenizer, create a function that will tokenize the text. You should also truncate
|
||||
longer sequences in the text to be no longer than the model's maximum input length:
|
||||
|
||||
```python
|
||||
def preprocess_function(examples):
|
||||
return tokenizer(examples["text"], truncation=True)
|
||||
```
|
||||
|
||||
Use 🤗 Datasets `map` function to apply the preprocessing function to the entire dataset. You can also set
|
||||
`batched=True` to apply the preprocessing function to multiple elements of the dataset at once for faster
|
||||
preprocessing:
|
||||
|
||||
```python
|
||||
tokenized_imdb = imdb.map(preprocess_function, batched=True)
|
||||
```
|
||||
|
||||
Lastly, pad your text so they are a uniform length. While it is possible to pad your text in the `tokenizer` function
|
||||
by setting `padding=True`, it is more efficient to only pad the text to the length of the longest element in its
|
||||
batch. This is known as **dynamic padding**. You can do this with the `DataCollatorWithPadding` function:
|
||||
|
||||
```python
|
||||
from transformers import DataCollatorWithPadding
|
||||
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
|
||||
```
|
||||
|
||||
### Fine-tune with the Trainer API
|
||||
|
||||
Now load your model with the [`AutoModelForSequenceClassification`] class along with the number of expected labels:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForSequenceClassification
|
||||
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
|
||||
```
|
||||
|
||||
At this point, only three steps remain:
|
||||
|
||||
1. Define your training hyperparameters in [`TrainingArguments`].
|
||||
2. Pass the training arguments to a [`Trainer`] along with the model, dataset, tokenizer, and data collator.
|
||||
3. Call [`Trainer.train()`] to fine-tune your model.
|
||||
|
||||
```python
|
||||
from transformers import TrainingArguments, Trainer
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir='./results',
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=5,
|
||||
weight_decay=0.01,
|
||||
)
|
||||
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_imdb["train"],
|
||||
eval_dataset=tokenized_imdb["test"],
|
||||
tokenizer=tokenizer,
|
||||
data_collator=data_collator,
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Fine-tune with TensorFlow
|
||||
|
||||
Fine-tuning with TensorFlow is just as easy, with only a few differences.
|
||||
|
||||
Start by batching the processed examples together with dynamic padding using the [`DataCollatorWithPadding`] function.
|
||||
Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of PyTorch tensors!
|
||||
|
||||
```python
|
||||
from transformers import DataCollatorWithPadding
|
||||
data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf")
|
||||
```
|
||||
|
||||
Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`. Specify inputs and labels in the
|
||||
`columns` argument:
|
||||
|
||||
```python
|
||||
tf_train_dataset = tokenized_imdb["train"].to_tf_dataset(
|
||||
columns=['attention_mask', 'input_ids', 'label'],
|
||||
shuffle=True,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
|
||||
tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset(
|
||||
columns=['attention_mask', 'input_ids', 'label'],
|
||||
shuffle=False,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
```
|
||||
|
||||
Set up an optimizer function, learning rate schedule, and some training hyperparameters:
|
||||
|
||||
```python
|
||||
from transformers import create_optimizer
|
||||
import tensorflow as tf
|
||||
|
||||
batch_size = 16
|
||||
num_epochs = 5
|
||||
batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
|
||||
total_train_steps = int(batches_per_epoch * num_epochs)
|
||||
optimizer, schedule = create_optimizer(
|
||||
init_lr=2e-5,
|
||||
num_warmup_steps=0,
|
||||
num_train_steps=total_train_steps
|
||||
)
|
||||
```
|
||||
|
||||
Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels:
|
||||
|
||||
```python
|
||||
from transformers import TFAutoModelForSequenceClassification
|
||||
model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
|
||||
```
|
||||
|
||||
Compile the model:
|
||||
|
||||
```python
|
||||
import tensorflow as tf
|
||||
model.compile(optimizer=optimizer)
|
||||
```
|
||||
|
||||
Finally, fine-tune the model by calling `model.fit`:
|
||||
|
||||
```python
|
||||
model.fit(
|
||||
tf_train_set,
|
||||
validation_data=tf_validation_set,
|
||||
epochs=num_train_epochs,
|
||||
)
|
||||
```
|
||||
|
||||
<a id='tok_ner'></a>
|
||||
|
||||
## Token classification with WNUT emerging entities
|
||||
|
||||
Token classification refers to the task of classifying individual tokens in a sentence. One of the most common token
|
||||
classification tasks is Named Entity Recognition (NER). NER attempts to find a label for each entity in a sentence,
|
||||
such as a person, location, or organization. In this example, learn how to fine-tune a model on the [WNUT 17](https://huggingface.co/datasets/wnut_17) dataset to detect new entities.
|
||||
|
||||
<Tip>
|
||||
|
||||
For a more in-depth example of how to fine-tune a model for token classification, take a look at the corresponding
|
||||
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb)
|
||||
or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification-tf.ipynb).
|
||||
|
||||
</Tip>
|
||||
|
||||
### Load WNUT 17 dataset
|
||||
|
||||
Load the WNUT 17 dataset from the 🤗 Datasets library:
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
wnut = load_dataset("wnut_17")
|
||||
```
|
||||
|
||||
A quick look at the dataset shows the labels associated with each word in the sentence:
|
||||
|
||||
```python
|
||||
wnut["train"][0]
|
||||
{'id': '0',
|
||||
'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0],
|
||||
'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.']
|
||||
}
|
||||
```
|
||||
|
||||
View the specific NER tags by:
|
||||
|
||||
```python
|
||||
label_list = wnut["train"].features[f"ner_tags"].feature.names
|
||||
label_list
|
||||
['O',
|
||||
'B-corporation',
|
||||
'I-corporation',
|
||||
'B-creative-work',
|
||||
'I-creative-work',
|
||||
'B-group',
|
||||
'I-group',
|
||||
'B-location',
|
||||
'I-location',
|
||||
'B-person',
|
||||
'I-person',
|
||||
'B-product',
|
||||
'I-product'
|
||||
]
|
||||
```
|
||||
|
||||
A letter prefixes each NER tag which can mean:
|
||||
|
||||
- `B-` indicates the beginning of an entity.
|
||||
- `I-` indicates a token is contained inside the same entity (e.g., the `State` token is a part of an entity like
|
||||
`Empire State Building`).
|
||||
- `0` indicates the token doesn't correspond to any entity.
|
||||
|
||||
### Preprocess
|
||||
|
||||
Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
|
||||
```
|
||||
|
||||
Since the input has already been split into words, set `is_split_into_words=True` to tokenize the words into
|
||||
subwords:
|
||||
|
||||
```python
|
||||
tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
|
||||
tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
|
||||
tokens
|
||||
['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
|
||||
```
|
||||
|
||||
The addition of the special tokens `[CLS]` and `[SEP]` and subword tokenization creates a mismatch between the
|
||||
input and labels. Realign the labels and tokens by:
|
||||
|
||||
1. Mapping all tokens to their corresponding word with the `word_ids` method.
|
||||
2. Assigning the label `-100` to the special tokens `[CLS]` and ``[SEP]``` so the PyTorch loss function ignores
|
||||
them.
|
||||
3. Only labeling the first token of a given word. Assign `-100` to the other subtokens from the same word.
|
||||
|
||||
Here is how you can create a function that will realign the labels and tokens:
|
||||
|
||||
```python
|
||||
def tokenize_and_align_labels(examples):
|
||||
tokenized_inputs = tokenizer(examples["tokens"], truncation=True, is_split_into_words=True)
|
||||
|
||||
labels = []
|
||||
for i, label in enumerate(examples[f"ner_tags"]):
|
||||
word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word.
|
||||
previous_word_idx = None
|
||||
label_ids = []
|
||||
for word_idx in word_ids: # Set the special tokens to -100.
|
||||
if word_idx is None:
|
||||
label_ids.append(-100)
|
||||
elif word_idx != previous_word_idx: # Only label the first token of a given word.
|
||||
label_ids.append(label[word_idx])
|
||||
|
||||
labels.append(label_ids)
|
||||
|
||||
tokenized_inputs["labels"] = labels
|
||||
return tokenized_inputs
|
||||
```
|
||||
|
||||
Now tokenize and align the labels over the entire dataset with 🤗 Datasets `map` function:
|
||||
|
||||
```python
|
||||
tokenized_wnut = wnut.map(tokenize_and_align_labels, batched=True)
|
||||
```
|
||||
|
||||
Finally, pad your text and labels, so they are a uniform length:
|
||||
|
||||
```python
|
||||
from transformers import DataCollatorForTokenClassification
|
||||
data_collator = DataCollatorForTokenClassification(tokenizer)
|
||||
```
|
||||
|
||||
### Fine-tune with the Trainer API
|
||||
|
||||
Load your model with the [`AutoModelForTokenClassification`] class along with the number of expected labels:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
|
||||
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
|
||||
```
|
||||
|
||||
Gather your training arguments in [`TrainingArguments`]:
|
||||
|
||||
```python
|
||||
training_args = TrainingArguments(
|
||||
output_dir='./results',
|
||||
evaluation_strategy="epoch",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=3,
|
||||
weight_decay=0.01,
|
||||
)
|
||||
```
|
||||
|
||||
Collect your model, training arguments, dataset, data collator, and tokenizer in [`Trainer`]:
|
||||
|
||||
```python
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_wnut["train"],
|
||||
eval_dataset=tokenized_wnut["test"],
|
||||
data_collator=data_collator,
|
||||
tokenizer=tokenizer,
|
||||
)
|
||||
```
|
||||
|
||||
Fine-tune your model:
|
||||
|
||||
```python
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Fine-tune with TensorFlow
|
||||
|
||||
Batch your examples together and pad your text and labels, so they are a uniform length:
|
||||
|
||||
```python
|
||||
from transformers import DataCollatorForTokenClassification
|
||||
data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf")
|
||||
```
|
||||
|
||||
Convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`:
|
||||
|
||||
```python
|
||||
tf_train_set = tokenized_wnut["train"].to_tf_dataset(
|
||||
columns=["attention_mask", "input_ids", "labels"],
|
||||
shuffle=True,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
|
||||
tf_validation_set = tokenized_wnut["validation"].to_tf_dataset(
|
||||
columns=["attention_mask", "input_ids", "labels"],
|
||||
shuffle=False,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
```
|
||||
|
||||
Load the model with the [`TFAutoModelForTokenClassification`] class along with the number of expected labels:
|
||||
|
||||
```python
|
||||
from transformers import TFAutoModelForTokenClassification
|
||||
model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
|
||||
```
|
||||
|
||||
Set up an optimizer function, learning rate schedule, and some training hyperparameters:
|
||||
|
||||
```python
|
||||
from transformers import create_optimizer
|
||||
|
||||
batch_size = 16
|
||||
num_train_epochs = 3
|
||||
num_train_steps = (len(tokenized_datasets["train"]) // batch_size) * num_train_epochs
|
||||
optimizer, lr_schedule = create_optimizer(
|
||||
init_lr=2e-5,
|
||||
num_train_steps=num_train_steps,
|
||||
weight_decay_rate=0.01,
|
||||
num_warmup_steps=0,
|
||||
)
|
||||
```
|
||||
|
||||
Compile the model:
|
||||
|
||||
```python
|
||||
import tensorflow as tf
|
||||
model.compile(optimizer=optimizer)
|
||||
```
|
||||
|
||||
Call `model.fit` to fine-tune your model:
|
||||
|
||||
```python
|
||||
model.fit(
|
||||
tf_train_set,
|
||||
validation_data=tf_validation_set,
|
||||
epochs=num_train_epochs,
|
||||
)
|
||||
```
|
||||
|
||||
<a id='qa_squad'></a>
|
||||
|
||||
## Question Answering with SQuAD
|
||||
|
||||
There are many types of question answering (QA) tasks. Extractive QA focuses on identifying the answer from the text
|
||||
given a question. In this example, learn how to fine-tune a model on the [SQuAD](https://huggingface.co/datasets/squad) dataset.
|
||||
|
||||
<Tip>
|
||||
|
||||
For a more in-depth example of how to fine-tune a model for question answering, take a look at the corresponding
|
||||
[PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb)
|
||||
or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering-tf.ipynb).
|
||||
|
||||
</Tip>
|
||||
|
||||
### Load SQuAD dataset
|
||||
|
||||
Load the SQuAD dataset from the 🤗 Datasets library:
|
||||
|
||||
```python
|
||||
from datasets import load_dataset
|
||||
squad = load_dataset("squad")
|
||||
```
|
||||
|
||||
Take a look at an example from the dataset:
|
||||
|
||||
```python
|
||||
squad["train"][0]
|
||||
{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
|
||||
'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
|
||||
'id': '5733be284776f41900661182',
|
||||
'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
|
||||
'title': 'University_of_Notre_Dame'
|
||||
}
|
||||
```
|
||||
|
||||
### Preprocess
|
||||
|
||||
Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
|
||||
```
|
||||
|
||||
There are a few things to be aware of when preprocessing text for question answering:
|
||||
|
||||
1. Some examples in a dataset may have a very long `context` that exceeds the maximum input length of the model. You
|
||||
can deal with this by truncating the `context` and set `truncation="only_second"`.
|
||||
2. Next, you need to map the start and end positions of the answer to the original context. Set
|
||||
`return_offset_mapping=True` to handle this.
|
||||
3. With the mapping in hand, you can find the start and end tokens of the answer. Use the `sequence_ids` method to
|
||||
find which part of the offset corresponds to the question, and which part of the offset corresponds to the context.
|
||||
|
||||
Assemble everything in a preprocessing function as shown below:
|
||||
|
||||
```python
|
||||
def preprocess_function(examples):
|
||||
questions = [q.strip() for q in examples["question"]]
|
||||
inputs = tokenizer(
|
||||
questions,
|
||||
examples["context"],
|
||||
max_length=384,
|
||||
truncation="only_second",
|
||||
return_offsets_mapping=True,
|
||||
padding="max_length",
|
||||
)
|
||||
|
||||
offset_mapping = inputs.pop("offset_mapping")
|
||||
answers = examples["answers"]
|
||||
start_positions = []
|
||||
end_positions = []
|
||||
|
||||
for i, offset in enumerate(offset_mapping):
|
||||
answer = answers[i]
|
||||
start_char = answer["answer_start"][0]
|
||||
end_char = answer["answer_start"][0] + len(answer["text"][0])
|
||||
sequence_ids = inputs.sequence_ids(i)
|
||||
|
||||
# Find the start and end of the context
|
||||
idx = 0
|
||||
while sequence_ids[idx] != 1:
|
||||
idx += 1
|
||||
context_start = idx
|
||||
while sequence_ids[idx] == 1:
|
||||
idx += 1
|
||||
context_end = idx - 1
|
||||
|
||||
# If the answer is not fully inside the context, label it (0, 0)
|
||||
if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
|
||||
start_positions.append(0)
|
||||
end_positions.append(0)
|
||||
else:
|
||||
# Otherwise it's the start and end token positions
|
||||
idx = context_start
|
||||
while idx <= context_end and offset[idx][0] <= start_char:
|
||||
idx += 1
|
||||
start_positions.append(idx - 1)
|
||||
|
||||
idx = context_end
|
||||
while idx >= context_start and offset[idx][1] >= end_char:
|
||||
idx -= 1
|
||||
end_positions.append(idx + 1)
|
||||
|
||||
inputs["start_positions"] = start_positions
|
||||
inputs["end_positions"] = end_positions
|
||||
return inputs
|
||||
```
|
||||
|
||||
Apply the preprocessing function over the entire dataset with 🤗 Datasets `map` function:
|
||||
|
||||
```python
|
||||
tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)
|
||||
```
|
||||
|
||||
Batch the processed examples together:
|
||||
|
||||
```python
|
||||
from transformers import default_data_collator
|
||||
data_collator = default_data_collator
|
||||
```
|
||||
|
||||
### Fine-tune with the Trainer API
|
||||
|
||||
Load your model with the [`AutoModelForQuestionAnswering`] class:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
|
||||
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
|
||||
```
|
||||
|
||||
Gather your training arguments in [`TrainingArguments`]:
|
||||
|
||||
```python
|
||||
training_args = TrainingArguments(
|
||||
output_dir='./results',
|
||||
evaluation_strategy="epoch",
|
||||
learning_rate=2e-5,
|
||||
per_device_train_batch_size=16,
|
||||
per_device_eval_batch_size=16,
|
||||
num_train_epochs=3,
|
||||
weight_decay=0.01,
|
||||
)
|
||||
```
|
||||
|
||||
Collect your model, training arguments, dataset, data collator, and tokenizer in [`Trainer`]:
|
||||
|
||||
```python
|
||||
trainer = Trainer(
|
||||
model=model,
|
||||
args=training_args,
|
||||
train_dataset=tokenized_squad["train"],
|
||||
eval_dataset=tokenized_squad["validation"],
|
||||
data_collator=data_collator,
|
||||
tokenizer=tokenizer,
|
||||
)
|
||||
```
|
||||
|
||||
Fine-tune your model:
|
||||
|
||||
```python
|
||||
trainer.train()
|
||||
```
|
||||
|
||||
### Fine-tune with TensorFlow
|
||||
|
||||
Batch the processed examples together with a TensorFlow default data collator:
|
||||
|
||||
```python
|
||||
from transformers.data.data_collator import tf_default_collator
|
||||
data_collator = tf_default_collator
|
||||
```
|
||||
|
||||
Convert your datasets to the `tf.data.Dataset` format with the `to_tf_dataset` function:
|
||||
|
||||
```python
|
||||
tf_train_set = tokenized_squad["train"].to_tf_dataset(
|
||||
columns=["attention_mask", "input_ids", "start_positions", "end_positions"],
|
||||
dummy_labels=True,
|
||||
shuffle=True,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
|
||||
tf_validation_set = tokenized_squad["validation"].to_tf_dataset(
|
||||
columns=["attention_mask", "input_ids", "start_positions", "end_positions"],
|
||||
dummy_labels=True,
|
||||
shuffle=False,
|
||||
batch_size=16,
|
||||
collate_fn=data_collator,
|
||||
)
|
||||
```
|
||||
|
||||
Set up an optimizer function, learning rate schedule, and some training hyperparameters:
|
||||
|
||||
```python
|
||||
from transformers import create_optimizer
|
||||
|
||||
batch_size = 16
|
||||
num_epochs = 2
|
||||
total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
|
||||
optimizer, schedule = create_optimizer(
|
||||
init_lr=2e-5,
|
||||
num_warmup_steps=0,
|
||||
num_train_steps=total_train_steps,
|
||||
)
|
||||
```
|
||||
|
||||
Load your model with the [`TFAutoModelForQuestionAnswering`] class:
|
||||
|
||||
```python
|
||||
from transformers import TFAutoModelForQuestionAnswering
|
||||
model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
|
||||
```
|
||||
|
||||
Compile the model:
|
||||
|
||||
```python
|
||||
import tensorflow as tf
|
||||
model.compile(optimizer=optimizer)
|
||||
```
|
||||
|
||||
Call `model.fit` to fine-tune the model:
|
||||
|
||||
```python
|
||||
model.fit(
|
||||
tf_train_set,
|
||||
validation_data=tf_validation_set,
|
||||
epochs=num_train_epochs,
|
||||
)
|
||||
```
|
||||
@ -1,729 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Fine-tuning with custom datasets
|
||||
=======================================================================================================================
|
||||
|
||||
.. note::
|
||||
|
||||
The datasets used in this tutorial are available and can be more easily accessed using the `🤗 Datasets library
|
||||
<https://github.com/huggingface/datasets>`_. We do not use this library to access the datasets here since this
|
||||
tutorial meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the
|
||||
tutorial in the section ":ref:`datasetslib`".
|
||||
|
||||
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The guide
|
||||
shows one of many valid workflows for using these models and is meant to be illustrative rather than definitive. We
|
||||
show examples of reading in several data formats, preprocessing the data for several types of tasks, and then preparing
|
||||
the data into PyTorch/TensorFlow ``Dataset`` objects which can easily be used either with
|
||||
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow.
|
||||
|
||||
We include several examples, each of which demonstrates a different type of common downstream task:
|
||||
|
||||
- :ref:`seq_imdb`
|
||||
- :ref:`tok_ner`
|
||||
- :ref:`qa_squad`
|
||||
- :ref:`resources`
|
||||
|
||||
.. _seq_imdb:
|
||||
|
||||
Sequence Classification with IMDb Reviews
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. note::
|
||||
|
||||
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and
|
||||
can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("imdb")``.
|
||||
|
||||
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes
|
||||
the text of a review and requires the model to predict whether the sentiment of the review is positive or negative.
|
||||
Let's start by downloading the dataset from the `Large Movie Review Dataset
|
||||
<http://ai.stanford.edu/~amaas/data/sentiment/>`_ webpage.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
|
||||
tar -xf aclImdb_v1.tar.gz
|
||||
|
||||
This data is organized into ``pos`` and ``neg`` folders with one text file per example. Let's write a function that can
|
||||
read this in.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
def read_imdb_split(split_dir):
|
||||
split_dir = Path(split_dir)
|
||||
texts = []
|
||||
labels = []
|
||||
for label_dir in ["pos", "neg"]:
|
||||
for text_file in (split_dir/label_dir).iterdir():
|
||||
texts.append(text_file.read_text())
|
||||
labels.append(0 if label_dir is "neg" else 1)
|
||||
|
||||
return texts, labels
|
||||
|
||||
train_texts, train_labels = read_imdb_split('aclImdb/train')
|
||||
test_texts, test_labels = read_imdb_split('aclImdb/test')
|
||||
|
||||
We now have a train and test dataset, but let's also also create a validation set which we can use for for evaluation
|
||||
and tuning without tainting our test set results. Sklearn has a convenient utility for creating such splits:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from sklearn.model_selection import train_test_split
|
||||
train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
|
||||
|
||||
Alright, we've read in our dataset. Now let's tackle tokenization. We'll eventually train a classifier using
|
||||
pre-trained DistilBert, so let's use the DistilBert tokenizer.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from transformers import DistilBertTokenizerFast
|
||||
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
|
||||
|
||||
Now we can simply pass our texts to the tokenizer. We'll pass ``truncation=True`` and ``padding=True``, which will
|
||||
ensure that all of our sequences are padded to the same length and are truncated to be no longer model's maximum input
|
||||
length. This will allow us to feed batches of sequences into the model at the same time.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
train_encodings = tokenizer(train_texts, truncation=True, padding=True)
|
||||
val_encodings = tokenizer(val_texts, truncation=True, padding=True)
|
||||
test_encodings = tokenizer(test_texts, truncation=True, padding=True)
|
||||
|
||||
Now, let's turn our labels and encodings into a Dataset object. In PyTorch, this is done by subclassing a
|
||||
``torch.utils.data.Dataset`` object and implementing ``__len__`` and ``__getitem__``. In TensorFlow, we pass our input
|
||||
encodings and labels to the ``from_tensor_slices`` constructor method. We put the data in this format so that the data
|
||||
can be easily batched such that each key in the batch encoding corresponds to a named parameter of the
|
||||
:meth:`~transformers.DistilBertForSequenceClassification.forward` method of the model we will train.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
import torch
|
||||
|
||||
class IMDbDataset(torch.utils.data.Dataset):
|
||||
def __init__(self, encodings, labels):
|
||||
self.encodings = encodings
|
||||
self.labels = labels
|
||||
|
||||
def __getitem__(self, idx):
|
||||
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
|
||||
item['labels'] = torch.tensor(self.labels[idx])
|
||||
return item
|
||||
|
||||
def __len__(self):
|
||||
return len(self.labels)
|
||||
|
||||
train_dataset = IMDbDataset(train_encodings, train_labels)
|
||||
val_dataset = IMDbDataset(val_encodings, val_labels)
|
||||
test_dataset = IMDbDataset(test_encodings, test_labels)
|
||||
## TENSORFLOW CODE
|
||||
import tensorflow as tf
|
||||
|
||||
train_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
dict(train_encodings),
|
||||
train_labels
|
||||
))
|
||||
val_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
dict(val_encodings),
|
||||
val_labels
|
||||
))
|
||||
test_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
dict(test_encodings),
|
||||
test_labels
|
||||
))
|
||||
|
||||
Now that our datasets our ready, we can fine-tune a model either with the 🤗
|
||||
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow. See :doc:`training
|
||||
<training>`.
|
||||
|
||||
.. _ft_trainer:
|
||||
|
||||
Fine-tuning with Trainer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The steps above prepared the datasets in the way that the trainer is expected. Now all we need to do is create a model
|
||||
to fine-tune, define the :class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` and
|
||||
instantiate a :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
from transformers import DistilBertForSequenceClassification, Trainer, TrainingArguments
|
||||
|
||||
training_args = TrainingArguments(
|
||||
output_dir='./results', # output directory
|
||||
num_train_epochs=3, # total number of training epochs
|
||||
per_device_train_batch_size=16, # batch size per device during training
|
||||
per_device_eval_batch_size=64, # batch size for evaluation
|
||||
warmup_steps=500, # number of warmup steps for learning rate scheduler
|
||||
weight_decay=0.01, # strength of weight decay
|
||||
logging_dir='./logs', # directory for storing logs
|
||||
logging_steps=10,
|
||||
)
|
||||
|
||||
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
|
||||
|
||||
trainer = Trainer(
|
||||
model=model, # the instantiated 🤗 Transformers model to be trained
|
||||
args=training_args, # training arguments, defined above
|
||||
train_dataset=train_dataset, # training dataset
|
||||
eval_dataset=val_dataset # evaluation dataset
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
## TENSORFLOW CODE
|
||||
from transformers import TFDistilBertForSequenceClassification, TFTrainer, TFTrainingArguments
|
||||
|
||||
training_args = TFTrainingArguments(
|
||||
output_dir='./results', # output directory
|
||||
num_train_epochs=3, # total number of training epochs
|
||||
per_device_train_batch_size=16, # batch size per device during training
|
||||
per_device_eval_batch_size=64, # batch size for evaluation
|
||||
warmup_steps=500, # number of warmup steps for learning rate scheduler
|
||||
weight_decay=0.01, # strength of weight decay
|
||||
logging_dir='./logs', # directory for storing logs
|
||||
logging_steps=10,
|
||||
)
|
||||
|
||||
with training_args.strategy.scope():
|
||||
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
|
||||
|
||||
trainer = TFTrainer(
|
||||
model=model, # the instantiated 🤗 Transformers model to be trained
|
||||
args=training_args, # training arguments, defined above
|
||||
train_dataset=train_dataset, # training dataset
|
||||
eval_dataset=val_dataset # evaluation dataset
|
||||
)
|
||||
|
||||
trainer.train()
|
||||
|
||||
.. _ft_native:
|
||||
|
||||
Fine-tuning with native PyTorch/TensorFlow
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We can also train use native PyTorch or TensorFlow:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
from torch.utils.data import DataLoader
|
||||
from transformers import DistilBertForSequenceClassification, AdamW
|
||||
|
||||
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
|
||||
|
||||
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
|
||||
model.to(device)
|
||||
model.train()
|
||||
|
||||
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
|
||||
|
||||
optim = AdamW(model.parameters(), lr=5e-5)
|
||||
|
||||
for epoch in range(3):
|
||||
for batch in train_loader:
|
||||
optim.zero_grad()
|
||||
input_ids = batch['input_ids'].to(device)
|
||||
attention_mask = batch['attention_mask'].to(device)
|
||||
labels = batch['labels'].to(device)
|
||||
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
|
||||
loss = outputs[0]
|
||||
loss.backward()
|
||||
optim.step()
|
||||
|
||||
model.eval()
|
||||
## TENSORFLOW CODE
|
||||
from transformers import TFDistilBertForSequenceClassification
|
||||
|
||||
model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased')
|
||||
|
||||
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
|
||||
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
|
||||
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
|
||||
|
||||
.. _tok_ner:
|
||||
|
||||
Token Classification with W-NUT Emerging Entities
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. note::
|
||||
|
||||
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_),
|
||||
and can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("wnut_17")``.
|
||||
|
||||
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
|
||||
token. We'll demonstrate how to do this with `Named Entity Recognition
|
||||
<http://nlpprogress.com/english/named_entity_recognition.html>`_, which involves identifying tokens which correspond to
|
||||
a predefined set of "entities". Specifically, we'll use the `W-NUT Emerging and Rare entities
|
||||
<http://noisy-text.github.io/2017/emerging-rare-entities.html>`_ corpus. The data is given as a collection of
|
||||
pre-tokenized documents where each token is assigned a tag.
|
||||
|
||||
Let's start by downloading the data.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
wget http://noisy-text.github.io/2017/files/wnut17train.conll
|
||||
|
||||
In this case, we'll just download the train set, which is a single text file. Each line of the file contains either (1)
|
||||
a word and tag separated by a tab, or (2) a blank line indicating the end of a document. Let's write a function to read
|
||||
this in. We'll take in the file path and return ``token_docs`` which is a list of lists of token strings, and
|
||||
``token_tags`` which is a list of lists of tag strings.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from pathlib import Path
|
||||
import re
|
||||
|
||||
def read_wnut(file_path):
|
||||
file_path = Path(file_path)
|
||||
|
||||
raw_text = file_path.read_text().strip()
|
||||
raw_docs = re.split(r'\n\t?\n', raw_text)
|
||||
token_docs = []
|
||||
tag_docs = []
|
||||
for doc in raw_docs:
|
||||
tokens = []
|
||||
tags = []
|
||||
for line in doc.split('\n'):
|
||||
token, tag = line.split('\t')
|
||||
tokens.append(token)
|
||||
tags.append(tag)
|
||||
token_docs.append(tokens)
|
||||
tag_docs.append(tags)
|
||||
|
||||
return token_docs, tag_docs
|
||||
|
||||
texts, tags = read_wnut('wnut17train.conll')
|
||||
|
||||
Just to see what this data looks like, let's take a look at a segment of the first document.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> print(texts[0][10:17], tags[0][10:17], sep='\n')
|
||||
['for', 'two', 'weeks', '.', 'Empire', 'State', 'Building']
|
||||
['O', 'O', 'O', 'O', 'B-location', 'I-location', 'I-location']
|
||||
|
||||
``location`` is an entity type, ``B-`` indicates the beginning of an entity, and ``I-`` indicates consecutive positions
|
||||
of the same entity ("Empire State Building" is considered one entity). ``O`` indicates the token does not correspond to
|
||||
any entity.
|
||||
|
||||
Now that we've read the data in, let's create a train/validation split:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from sklearn.model_selection import train_test_split
|
||||
train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
|
||||
|
||||
Next, let's create encodings for our tokens and tags. For the tags, we can start by just create a simple mapping which
|
||||
we'll use in a moment:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
unique_tags = set(tag for doc in tags for tag in doc)
|
||||
tag2id = {tag: id for id, tag in enumerate(unique_tags)}
|
||||
id2tag = {id: tag for tag, id in tag2id.items()}
|
||||
|
||||
To encode the tokens, we'll use a pre-trained DistilBert tokenizer. We can tell the tokenizer that we're dealing with
|
||||
ready-split tokens rather than full sentence strings by passing ``is_split_into_words=True``. We'll also pass
|
||||
``padding=True`` and ``truncation=True`` to pad the sequences to be the same length. Lastly, we can tell the model to
|
||||
return information about the tokens which are split by the wordpiece tokenization process, which we will need in a
|
||||
moment.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from transformers import DistilBertTokenizerFast
|
||||
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-cased')
|
||||
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
|
||||
val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
|
||||
|
||||
Great, so now our tokens are nicely encoded in the format that they need to be in to feed them into our DistilBert
|
||||
model below.
|
||||
|
||||
Now we arrive at a common obstacle with using pre-trained models for token-level classification: many of the tokens in
|
||||
the W-NUT corpus are not in DistilBert's vocabulary. Bert and many models like it use a method called WordPiece
|
||||
Tokenization, meaning that single words are split into multiple tokens such that each token is likely to be in the
|
||||
vocabulary. For example, DistilBert's tokenizer would split the Twitter handle ``@huggingface`` into the tokens ``['@',
|
||||
'hugging', '##face']``. This is a problem for us because we have exactly one tag per token. If the tokenizer splits a
|
||||
token into multiple sub-tokens, then we will end up with a mismatch between our tokens and our labels.
|
||||
|
||||
One way to handle this is to only train on the tag labels for the first subtoken of a split token. We can do this in 🤗
|
||||
Transformers by setting the labels we wish to ignore to ``-100``. In the example above, if the label for
|
||||
``@HuggingFace`` is ``3`` (indexing ``B-corporation``), we would set the labels of ``['@', 'hugging', '##face']`` to
|
||||
``[3, -100, -100]``.
|
||||
|
||||
Let's write a function to do this. This is where we will use the ``offset_mapping`` from the tokenizer as mentioned
|
||||
above. For each sub-token returned by the tokenizer, the offset mapping gives us a tuple indicating the sub-token's
|
||||
start position and end position relative to the original token it was split from. That means that if the first position
|
||||
in the tuple is anything other than ``0``, we will set its corresponding label to ``-100``. While we're at it, we can
|
||||
also set labels to ``-100`` if the second position of the offset mapping is ``0``, since this means it must be a
|
||||
special token like ``[PAD]`` or ``[CLS]``.
|
||||
|
||||
.. note::
|
||||
|
||||
Due to a recently fixed bug, -1 must be used instead of -100 when using TensorFlow in 🤗 Transformers <= 3.02.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import numpy as np
|
||||
|
||||
def encode_tags(tags, encodings):
|
||||
labels = [[tag2id[tag] for tag in doc] for doc in tags]
|
||||
encoded_labels = []
|
||||
for doc_labels, doc_offset in zip(labels, encodings.offset_mapping):
|
||||
# create an empty array of -100
|
||||
doc_enc_labels = np.ones(len(doc_offset),dtype=int) * -100
|
||||
arr_offset = np.array(doc_offset)
|
||||
|
||||
# set labels whose first offset position is 0 and the second is not 0
|
||||
doc_enc_labels[(arr_offset[:,0] == 0) & (arr_offset[:,1] != 0)] = doc_labels
|
||||
encoded_labels.append(doc_enc_labels.tolist())
|
||||
|
||||
return encoded_labels
|
||||
|
||||
train_labels = encode_tags(train_tags, train_encodings)
|
||||
val_labels = encode_tags(val_tags, val_encodings)
|
||||
|
||||
The hard part is now done. Just as in the sequence classification example above, we can create a dataset object:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
import torch
|
||||
|
||||
class WNUTDataset(torch.utils.data.Dataset):
|
||||
def __init__(self, encodings, labels):
|
||||
self.encodings = encodings
|
||||
self.labels = labels
|
||||
|
||||
def __getitem__(self, idx):
|
||||
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
|
||||
item['labels'] = torch.tensor(self.labels[idx])
|
||||
return item
|
||||
|
||||
def __len__(self):
|
||||
return len(self.labels)
|
||||
|
||||
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
|
||||
val_encodings.pop("offset_mapping")
|
||||
train_dataset = WNUTDataset(train_encodings, train_labels)
|
||||
val_dataset = WNUTDataset(val_encodings, val_labels)
|
||||
## TENSORFLOW CODE
|
||||
import tensorflow as tf
|
||||
|
||||
train_encodings.pop("offset_mapping") # we don't want to pass this to the model
|
||||
val_encodings.pop("offset_mapping")
|
||||
|
||||
train_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
dict(train_encodings),
|
||||
train_labels
|
||||
))
|
||||
val_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
dict(val_encodings),
|
||||
val_labels
|
||||
))
|
||||
|
||||
Now load in a token classification model and specify the number of labels:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
from transformers import DistilBertForTokenClassification
|
||||
model = DistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
|
||||
## TENSORFLOW CODE
|
||||
from transformers import TFDistilBertForTokenClassification
|
||||
model = TFDistilBertForTokenClassification.from_pretrained('distilbert-base-cased', num_labels=len(unique_tags))
|
||||
|
||||
The data and model are both ready to go. You can train the model either with
|
||||
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` or with native PyTorch/TensorFlow, exactly as in the
|
||||
sequence classification example above.
|
||||
|
||||
- :ref:`ft_trainer`
|
||||
- :ref:`ft_native`
|
||||
|
||||
.. _qa_squad:
|
||||
|
||||
Question Answering with SQuAD 2.0
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. note::
|
||||
|
||||
This dataset can be explored in the Hugging Face model hub (`SQuAD V2
|
||||
<https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 Datasets library with
|
||||
``load_dataset("squad_v2")``.
|
||||
|
||||
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
|
||||
involves answering a question about a passage by highlighting the segment of the passage that answers the question.
|
||||
This involves fine-tuning a model which predicts a start position and an end position in the passage. We will use the
|
||||
`Stanford Question Answering Dataset (SQuAD) 2.0 <https://rajpurkar.github.io/SQuAD-explorer/>`_.
|
||||
|
||||
We will start by downloading the data:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
mkdir squad
|
||||
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O squad/train-v2.0.json
|
||||
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O squad/dev-v2.0.json
|
||||
|
||||
Each split is in a structured json file with a number of questions and answers for each passage (or context). We'll
|
||||
take this apart into parallel lists of contexts, questions, and answers (note that the contexts here are repeated since
|
||||
there are multiple questions per context):
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
def read_squad(path):
|
||||
path = Path(path)
|
||||
with open(path, 'rb') as f:
|
||||
squad_dict = json.load(f)
|
||||
|
||||
contexts = []
|
||||
questions = []
|
||||
answers = []
|
||||
for group in squad_dict['data']:
|
||||
for passage in group['paragraphs']:
|
||||
context = passage['context']
|
||||
for qa in passage['qas']:
|
||||
question = qa['question']
|
||||
for answer in qa['answers']:
|
||||
contexts.append(context)
|
||||
questions.append(question)
|
||||
answers.append(answer)
|
||||
|
||||
return contexts, questions, answers
|
||||
|
||||
train_contexts, train_questions, train_answers = read_squad('squad/train-v2.0.json')
|
||||
val_contexts, val_questions, val_answers = read_squad('squad/dev-v2.0.json')
|
||||
|
||||
The contexts and questions are just strings. The answers are dicts containing the subsequence of the passage with the
|
||||
correct answer as well as an integer indicating the character at which the answer begins. In order to train a model on
|
||||
this data we need (1) the tokenized context/question pairs, and (2) integers indicating at which *token* positions the
|
||||
answer begins and ends.
|
||||
|
||||
First, let's get the *character* position at which the answer ends in the passage (we are given the starting position).
|
||||
Sometimes SQuAD answers are off by one or two characters, so we will also adjust for that.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def add_end_idx(answers, contexts):
|
||||
for answer, context in zip(answers, contexts):
|
||||
gold_text = answer['text']
|
||||
start_idx = answer['answer_start']
|
||||
end_idx = start_idx + len(gold_text)
|
||||
|
||||
# sometimes squad answers are off by a character or two – fix this
|
||||
if context[start_idx:end_idx] == gold_text:
|
||||
answer['answer_end'] = end_idx
|
||||
elif context[start_idx-1:end_idx-1] == gold_text:
|
||||
answer['answer_start'] = start_idx - 1
|
||||
answer['answer_end'] = end_idx - 1 # When the gold label is off by one character
|
||||
elif context[start_idx-2:end_idx-2] == gold_text:
|
||||
answer['answer_start'] = start_idx - 2
|
||||
answer['answer_end'] = end_idx - 2 # When the gold label is off by two characters
|
||||
|
||||
add_end_idx(train_answers, train_contexts)
|
||||
add_end_idx(val_answers, val_contexts)
|
||||
|
||||
Now ``train_answers`` and ``val_answers`` include the character end positions and the corrected start positions. Next,
|
||||
let's tokenize our context/question pairs. 🤗 Tokenizers can accept parallel lists of sequences and encode them together
|
||||
as sequence pairs.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from transformers import DistilBertTokenizerFast
|
||||
tokenizer = DistilBertTokenizerFast.from_pretrained('distilbert-base-uncased')
|
||||
|
||||
train_encodings = tokenizer(train_contexts, train_questions, truncation=True, padding=True)
|
||||
val_encodings = tokenizer(val_contexts, val_questions, truncation=True, padding=True)
|
||||
|
||||
Next we need to convert our character start/end positions to token start/end positions. When using 🤗 Fast Tokenizers,
|
||||
we can use the built in :func:`~transformers.BatchEncoding.char_to_token` method.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def add_token_positions(encodings, answers):
|
||||
start_positions = []
|
||||
end_positions = []
|
||||
for i in range(len(answers)):
|
||||
start_positions.append(encodings.char_to_token(i, answers[i]['answer_start']))
|
||||
end_positions.append(encodings.char_to_token(i, answers[i]['answer_end'] - 1))
|
||||
|
||||
# if start position is None, the answer passage has been truncated
|
||||
if start_positions[-1] is None:
|
||||
start_positions[-1] = tokenizer.model_max_length
|
||||
if end_positions[-1] is None:
|
||||
end_positions[-1] = tokenizer.model_max_length
|
||||
|
||||
encodings.update({'start_positions': start_positions, 'end_positions': end_positions})
|
||||
|
||||
add_token_positions(train_encodings, train_answers)
|
||||
add_token_positions(val_encodings, val_answers)
|
||||
|
||||
Our data is ready. Let's just put it in a PyTorch/TensorFlow dataset so that we can easily use it for training. In
|
||||
PyTorch, we define a custom ``Dataset`` class. In TensorFlow, we pass a tuple of ``(inputs_dict, labels_dict)`` to the
|
||||
``from_tensor_slices`` method.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
import torch
|
||||
|
||||
class SquadDataset(torch.utils.data.Dataset):
|
||||
def __init__(self, encodings):
|
||||
self.encodings = encodings
|
||||
|
||||
def __getitem__(self, idx):
|
||||
return {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
|
||||
|
||||
def __len__(self):
|
||||
return len(self.encodings.input_ids)
|
||||
|
||||
train_dataset = SquadDataset(train_encodings)
|
||||
val_dataset = SquadDataset(val_encodings)
|
||||
## TENSORFLOW CODE
|
||||
import tensorflow as tf
|
||||
|
||||
train_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
{key: train_encodings[key] for key in ['input_ids', 'attention_mask']},
|
||||
{key: train_encodings[key] for key in ['start_positions', 'end_positions']}
|
||||
))
|
||||
val_dataset = tf.data.Dataset.from_tensor_slices((
|
||||
{key: val_encodings[key] for key in ['input_ids', 'attention_mask']},
|
||||
{key: val_encodings[key] for key in ['start_positions', 'end_positions']}
|
||||
))
|
||||
|
||||
Now we can use a DistilBert model with a QA head for training:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
from transformers import DistilBertForQuestionAnswering
|
||||
model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
|
||||
## TENSORFLOW CODE
|
||||
from transformers import TFDistilBertForQuestionAnswering
|
||||
model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
|
||||
|
||||
|
||||
The data and model are both ready to go. You can train the model with
|
||||
:class:`~transformers.Trainer`/:class:`~transformers.TFTrainer` exactly as in the sequence classification example
|
||||
above. If using native PyTorch, replace ``labels`` with ``start_positions`` and ``end_positions`` in the training
|
||||
example. If using Keras's ``fit``, we need to make a minor modification to handle this example since it involves
|
||||
multiple model outputs.
|
||||
|
||||
- :ref:`ft_trainer`
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
from torch.utils.data import DataLoader
|
||||
from transformers import AdamW
|
||||
|
||||
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
|
||||
|
||||
model.to(device)
|
||||
model.train()
|
||||
|
||||
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
|
||||
|
||||
optim = AdamW(model.parameters(), lr=5e-5)
|
||||
|
||||
for epoch in range(3):
|
||||
for batch in train_loader:
|
||||
optim.zero_grad()
|
||||
input_ids = batch['input_ids'].to(device)
|
||||
attention_mask = batch['attention_mask'].to(device)
|
||||
start_positions = batch['start_positions'].to(device)
|
||||
end_positions = batch['end_positions'].to(device)
|
||||
outputs = model(input_ids, attention_mask=attention_mask, start_positions=start_positions, end_positions=end_positions)
|
||||
loss = outputs[0]
|
||||
loss.backward()
|
||||
optim.step()
|
||||
|
||||
model.eval()
|
||||
## TENSORFLOW CODE
|
||||
# Keras will expect a tuple when dealing with labels
|
||||
train_dataset = train_dataset.map(lambda x, y: (x, (y['start_positions'], y['end_positions'])))
|
||||
|
||||
# Keras will assign a separate loss for each output and add them together. So we'll just use the standard CE loss
|
||||
# instead of using the built-in model.compute_loss, which expects a dict of outputs and averages the two terms.
|
||||
# Note that this means the loss will be 2x of when using TFTrainer since we're adding instead of averaging them.
|
||||
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
|
||||
model.distilbert.return_dict = False # if using 🤗 Transformers >3.02, make sure outputs are tuples
|
||||
|
||||
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
|
||||
model.compile(optimizer=optimizer, loss=loss) # can also use any keras loss fn
|
||||
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)
|
||||
|
||||
.. _resources:
|
||||
|
||||
Additional Resources
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
- `How to train a new language model from scratch using Transformers and Tokenizers
|
||||
<https://huggingface.co/blog/how-to-train>`_. Blog post showing the steps to load in Esperanto data and train a
|
||||
masked language model from scratch.
|
||||
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
|
||||
- :doc:`Training <training>`. Docs page on training and fine-tuning.
|
||||
|
||||
.. _datasetslib:
|
||||
|
||||
Using the 🤗 Datasets & Metrics library
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with 🤗
|
||||
Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the `🤗
|
||||
Datasets library <https://github.com/huggingface/datasets>`_ for working with the 150+ datasets included in the `hub
|
||||
<https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview, we
|
||||
will show how to use the Datasets library to download and prepare the IMDb dataset from the first example,
|
||||
:ref:`seq_imdb`.
|
||||
|
||||
Start by downloading the dataset:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from datasets import load_dataset
|
||||
train = load_dataset("imdb", split="train")
|
||||
|
||||
Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> print(train.column_names)
|
||||
['label', 'text']
|
||||
|
||||
Great. Now let's tokenize the text. We can do this using the ``map`` method. We'll also rename the ``label`` column to
|
||||
``labels`` to match the model's input arguments.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
train = train.map(lambda batch: tokenizer(batch["text"], truncation=True, padding=True), batched=True)
|
||||
train.rename_column_("label", "labels")
|
||||
|
||||
Lastly, we can use the ``set_format`` method to determine which columns and in what data format we want to access
|
||||
dataset elements.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
## PYTORCH CODE
|
||||
>>> train.set_format("torch", columns=["input_ids", "attention_mask", "labels"])
|
||||
>>> {key: val.shape for key, val in train[0].items()})
|
||||
{'labels': torch.Size([]), 'input_ids': torch.Size([512]), 'attention_mask': torch.Size([512])}
|
||||
## TENSORFLOW CODE
|
||||
>>> train.set_format("tensorflow", columns=["input_ids", "attention_mask", "labels"])
|
||||
>>> {key: val.shape for key, val in train[0].items()})
|
||||
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
|
||||
|
||||
We now have a fully-prepared dataset. Check out `the 🤗 Datasets docs
|
||||
<https://huggingface.co/docs/datasets/processing.html>`_ for a more thorough introduction.
|
||||
@ -154,7 +154,7 @@ input elements was ``6.27e+04`` and same for the output was ``inf``.
|
||||
You can see here, that ``T5DenseGatedGeluDense.forward`` resulted in output activations, whose absolute max value was
|
||||
around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have ``Dropout`` which renormalizes
|
||||
the weights, after it zeroed some of the elements, which pushes the absolute max value to more than 64K, and we get an
|
||||
overlow (``inf``).
|
||||
overflow (``inf``).
|
||||
|
||||
As you can see it's the previous frames that we need to look into when the numbers start going into very large for fp16
|
||||
numbers.
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 47 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 4.4 KiB After Width: | Height: | Size: 126 KiB |
BIN
docs/source/imgs/perf-moe-transformer.png
Normal file
BIN
docs/source/imgs/perf-moe-transformer.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 71 KiB |
BIN
docs/source/imgs/tf32-bf16-fp16-fp32.png
Normal file
BIN
docs/source/imgs/tf32-bf16-fp16-fp32.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 32 KiB |
271
docs/source/index.mdx
Normal file
271
docs/source/index.mdx
Normal file
@ -0,0 +1,271 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# 🤗 Transformers
|
||||
|
||||
State-of-the-art Machine Learning for Jax, Pytorch and TensorFlow
|
||||
|
||||
🤗 Transformers (formerly known as _pytorch-transformers_ and _pytorch-pretrained-bert_) provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
|
||||
|
||||
These models can applied on:
|
||||
|
||||
* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.
|
||||
* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
|
||||
* 🗣️ Audio, for tasks like speech recognition and audio classification.
|
||||
|
||||
Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
|
||||
|
||||
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
|
||||
|
||||
🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
|
||||
|
||||
This is the documentation of our repository [transformers](https://github.com/huggingface/transformers). You can
|
||||
also follow our [online course](https://huggingface.co/course) that teaches how to use this library, as well as the
|
||||
other libraries developed by Hugging Face and the Hub.
|
||||
|
||||
## If you are looking for custom support from the Hugging Face team
|
||||
|
||||
<a target="_blank" href="https://huggingface.co/support">
|
||||
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
|
||||
</a><br>
|
||||
|
||||
## Features
|
||||
|
||||
1. Easy-to-use state-of-the-art models:
|
||||
- High performance on natural language understanding & generation, computer vision, and audio tasks.
|
||||
- Low barrier to entry for educators and practitioners.
|
||||
- Few user-facing abstractions with just three classes to learn.
|
||||
- A unified API for using all our pretrained models.
|
||||
|
||||
1. Lower compute costs, smaller carbon footprint:
|
||||
- Researchers can share trained models instead of always retraining.
|
||||
- Practitioners can reduce compute time and production costs.
|
||||
- Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
|
||||
|
||||
1. Choose the right framework for every part of a model's lifetime:
|
||||
- Train state-of-the-art models in 3 lines of code.
|
||||
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
|
||||
- Seamlessly pick the right framework for training, evaluation and production.
|
||||
|
||||
1. Easily customize a model or an example to your needs:
|
||||
- We provide examples for each architecture to reproduce the results published by its original authors.
|
||||
- Model internals are exposed as consistently as possible.
|
||||
- Model files can be used independently of the library for quick experiments.
|
||||
|
||||
[All the model checkpoints](https://huggingface.co/models) are seamlessly integrated from the huggingface.co [model
|
||||
hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and
|
||||
[organizations](https://huggingface.co/organizations).
|
||||
|
||||
Current number of checkpoints: <img src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen">
|
||||
|
||||
## Contents
|
||||
|
||||
The documentation is organized in five parts:
|
||||
|
||||
- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
|
||||
and a glossary.
|
||||
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
|
||||
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
|
||||
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
|
||||
transformers model
|
||||
- **API** contains the documentation of each public class and function, grouped in:
|
||||
|
||||
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
|
||||
- **MODELS** for the classes and functions related to each model implemented in the library.
|
||||
- **INTERNAL HELPERS** for the classes and functions we use internally.
|
||||
|
||||
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
|
||||
conversion utilities for the following models.
|
||||
|
||||
### Supported models
|
||||
|
||||
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
|
||||
|
||||
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
|
||||
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
|
||||
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
|
||||
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
|
||||
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
|
||||
1. **[BERT For Sequence Generation](model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[BigBird-RoBERTa](model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BlenderbotSmall](model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
|
||||
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
|
||||
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
|
||||
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeBERTa-v2](model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
|
||||
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
|
||||
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation) and a German version of DistilBERT.
|
||||
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
1. **[EncoderDecoder](model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
|
||||
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
|
||||
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
1. **[GPT](model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
|
||||
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
|
||||
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
|
||||
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
|
||||
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
|
||||
1. **[LayoutXLM](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
|
||||
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
|
||||
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
|
||||
1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
1. **[Megatron-BERT](model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
|
||||
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
|
||||
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
|
||||
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
|
||||
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
|
||||
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
|
||||
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
|
||||
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
|
||||
1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
|
||||
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
1. **[Transformer-XL](model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
|
||||
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
|
||||
1. **[UniSpeechSat](model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
|
||||
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
|
||||
|
||||
### Supported frameworks
|
||||
|
||||
The table below represents the current support in the library for each of those models, whether they have a Python
|
||||
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
|
||||
Flax), PyTorch, and/or TensorFlow.
|
||||
|
||||
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
|
||||
|
||||
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
|
||||
|-----------------------------|----------------|----------------|-----------------|--------------------|--------------|
|
||||
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| BART | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| CLIP | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| DeBERTa-v2 | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
| GPT-J | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
|
||||
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| RoFormer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Speech2Text | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ |
|
||||
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Vision Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ |
|
||||
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
|
||||
<!-- End table-->
|
||||
@ -1,584 +0,0 @@
|
||||
Transformers
|
||||
=======================================================================================================================
|
||||
|
||||
State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
|
||||
|
||||
🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
|
||||
architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
|
||||
Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
|
||||
PyTorch and TensorFlow.
|
||||
|
||||
This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`__. You can
|
||||
also follow our `online course <https://huggingface.co/course>`__ that teaches how to use this library, as well as the
|
||||
other libraries developed by Hugging Face and the Hub.
|
||||
|
||||
If you are looking for custom support from the Hugging Face team
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<a target="_blank" href="https://huggingface.co/support">
|
||||
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
|
||||
</a><br>
|
||||
|
||||
Features
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
- High performance on NLU and NLG tasks
|
||||
- Low barrier to entry for educators and practitioners
|
||||
|
||||
State-of-the-art NLP for everyone:
|
||||
|
||||
- Deep learning researchers
|
||||
- Hands-on practitioners
|
||||
- AI/ML/NLP teachers and educators
|
||||
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Lower compute costs, smaller carbon footprint:
|
||||
|
||||
- Researchers can share trained models instead of always retraining
|
||||
- Practitioners can reduce compute time and production costs
|
||||
- 8 architectures with over 30 pretrained models, some in more than 100 languages
|
||||
|
||||
Choose the right framework for every part of a model's lifetime:
|
||||
|
||||
- Train state-of-the-art models in 3 lines of code
|
||||
- Deep interoperability between Jax, Pytorch and TensorFlow models
|
||||
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
|
||||
- Seamlessly pick the right framework for training, evaluation, production
|
||||
|
||||
The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
|
||||
|
||||
`All the model checkpoints <https://huggingface.co/models>`__ are seamlessly integrated from the huggingface.co `model
|
||||
hub <https://huggingface.co>`__ where they are uploaded directly by `users <https://huggingface.co/users>`__ and
|
||||
`organizations <https://huggingface.co/organizations>`__.
|
||||
|
||||
Current number of checkpoints: |checkpoints|
|
||||
|
||||
.. |checkpoints| image:: https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen
|
||||
|
||||
Contents
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The documentation is organized in five parts:
|
||||
|
||||
- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
|
||||
and a glossary.
|
||||
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
|
||||
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
|
||||
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
|
||||
transformers model
|
||||
- The three last section contain the documentation of each public class and function, grouped in:
|
||||
|
||||
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
|
||||
- **MODELS** for the classes and functions related to each model implemented in the library.
|
||||
- **INTERNAL HELPERS** for the classes and functions we use internally.
|
||||
|
||||
The library currently contains Jax, PyTorch and Tensorflow implementations, pretrained model weights, usage scripts and
|
||||
conversion utilities for the following models.
|
||||
|
||||
Supported models
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
..
|
||||
This list is updated automatically from the README with `make fix-copies`. Do not update manually!
|
||||
|
||||
1. :doc:`ALBERT <model_doc/albert>` (from Google Research and the Toyota Technological Institute at Chicago) released
|
||||
with the paper `ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
|
||||
<https://arxiv.org/abs/1909.11942>`__, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush
|
||||
Sharma, Radu Soricut.
|
||||
2. :doc:`BART <model_doc/bart>` (from Facebook) released with the paper `BART: Denoising Sequence-to-Sequence
|
||||
Pre-training for Natural Language Generation, Translation, and Comprehension
|
||||
<https://arxiv.org/pdf/1910.13461.pdf>`__ by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman
|
||||
Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
|
||||
3. :doc:`BARThez <model_doc/barthez>` (from École polytechnique) released with the paper `BARThez: a Skilled Pretrained
|
||||
French Sequence-to-Sequence Model <https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P.
|
||||
Tixier, Michalis Vazirgiannis.
|
||||
4. :doc:`BERT <model_doc/bert>` (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional
|
||||
Transformers for Language Understanding <https://arxiv.org/abs/1810.04805>`__ by Jacob Devlin, Ming-Wei Chang,
|
||||
Kenton Lee and Kristina Toutanova.
|
||||
5. :doc:`BERT For Sequence Generation <model_doc/bertgeneration>` (from Google) released with the paper `Leveraging
|
||||
Pre-trained Checkpoints for Sequence Generation Tasks <https://arxiv.org/abs/1907.12461>`__ by Sascha Rothe, Shashi
|
||||
Narayan, Aliaksei Severyn.
|
||||
6. :doc:`BigBird-RoBERTa <model_doc/bigbird>` (from Google Research) released with the paper `Big Bird: Transformers
|
||||
for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua
|
||||
Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
7. :doc:`BigBird-Pegasus <model_doc/bigbird_pegasus>` (from Google Research) released with the paper `Big Bird:
|
||||
Transformers for Longer Sequences <https://arxiv.org/abs/2007.14062>`__ by Manzil Zaheer, Guru Guruganesh, Avinava
|
||||
Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
|
||||
8. :doc:`Blenderbot <model_doc/blenderbot>` (from Facebook) released with the paper `Recipes for building an
|
||||
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
|
||||
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
9. :doc:`BlenderbotSmall <model_doc/blenderbot_small>` (from Facebook) released with the paper `Recipes for building an
|
||||
open-domain chatbot <https://arxiv.org/abs/2004.13637>`__ by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary
|
||||
Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
|
||||
10. :doc:`BORT <model_doc/bort>` (from Alexa) released with the paper `Optimal Subarchitecture Extraction For BERT
|
||||
<https://arxiv.org/abs/2010.10499>`__ by Adrian de Wynter and Daniel J. Perry.
|
||||
11. :doc:`ByT5 <model_doc/byt5>` (from Google Research) released with the paper `ByT5: Towards a token-free future with
|
||||
pre-trained byte-to-byte models <https://arxiv.org/abs/2105.13626>`__ by Linting Xue, Aditya Barua, Noah Constant,
|
||||
Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
|
||||
12. :doc:`CamemBERT <model_doc/camembert>` (from Inria/Facebook/Sorbonne) released with the paper `CamemBERT: a Tasty
|
||||
French Language Model <https://arxiv.org/abs/1911.03894>`__ by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz
|
||||
Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
|
||||
13. :doc:`CANINE <model_doc/canine>` (from Google Research) released with the paper `CANINE: Pre-training an Efficient
|
||||
Tokenization-Free Encoder for Language Representation <https://arxiv.org/abs/2103.06874>`__ by Jonathan H. Clark,
|
||||
Dan Garrette, Iulia Turc, John Wieting.
|
||||
14. :doc:`CLIP <model_doc/clip>` (from OpenAI) released with the paper `Learning Transferable Visual Models From
|
||||
Natural Language Supervision <https://arxiv.org/abs/2103.00020>`__ by Alec Radford, Jong Wook Kim, Chris Hallacy,
|
||||
Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen
|
||||
Krueger, Ilya Sutskever.
|
||||
15. :doc:`ConvBERT <model_doc/convbert>` (from YituTech) released with the paper `ConvBERT: Improving BERT with
|
||||
Span-based Dynamic Convolution <https://arxiv.org/abs/2008.02496>`__ by Zihang Jiang, Weihao Yu, Daquan Zhou,
|
||||
Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
|
||||
16. :doc:`CPM <model_doc/cpm>` (from Tsinghua University) released with the paper `CPM: A Large-scale Generative
|
||||
Chinese Pre-trained Language Model <https://arxiv.org/abs/2012.00413>`__ by Zhengyan Zhang, Xu Han, Hao Zhou, Pei
|
||||
Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng,
|
||||
Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang,
|
||||
Juanzi Li, Xiaoyan Zhu, Maosong Sun.
|
||||
17. :doc:`CTRL <model_doc/ctrl>` (from Salesforce) released with the paper `CTRL: A Conditional Transformer Language
|
||||
Model for Controllable Generation <https://arxiv.org/abs/1909.05858>`__ by Nitish Shirish Keskar*, Bryan McCann*,
|
||||
Lav R. Varshney, Caiming Xiong and Richard Socher.
|
||||
18. :doc:`DeBERTa <model_doc/deberta>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT with
|
||||
Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu
|
||||
Chen.
|
||||
19. :doc:`DeBERTa-v2 <model_doc/deberta_v2>` (from Microsoft) released with the paper `DeBERTa: Decoding-enhanced BERT
|
||||
with Disentangled Attention <https://arxiv.org/abs/2006.03654>`__ by Pengcheng He, Xiaodong Liu, Jianfeng Gao,
|
||||
Weizhu Chen.
|
||||
20. :doc:`DeiT <model_doc/deit>` (from Facebook) released with the paper `Training data-efficient image transformers &
|
||||
distillation through attention <https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs
|
||||
Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
|
||||
21. :doc:`DETR <model_doc/detr>` (from Facebook) released with the paper `End-to-End Object Detection with Transformers
|
||||
<https://arxiv.org/abs/2005.12872>`__ by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier,
|
||||
Alexander Kirillov, Sergey Zagoruyko.
|
||||
22. :doc:`DialoGPT <model_doc/dialogpt>` (from Microsoft Research) released with the paper `DialoGPT: Large-Scale
|
||||
Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`__ by Yizhe
|
||||
Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
23. :doc:`DistilBERT <model_doc/distilbert>` (from HuggingFace), released together with the paper `DistilBERT, a
|
||||
distilled version of BERT: smaller, faster, cheaper and lighter <https://arxiv.org/abs/1910.01108>`__ by Victor
|
||||
Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into `DistilGPT2
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, RoBERTa into `DistilRoBERTa
|
||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__, Multilingual BERT into
|
||||
`DistilmBERT <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__ and a German
|
||||
version of DistilBERT.
|
||||
24. :doc:`DPR <model_doc/dpr>` (from Facebook) released with the paper `Dense Passage Retrieval for Open-Domain
|
||||
Question Answering <https://arxiv.org/abs/2004.04906>`__ by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick
|
||||
Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
||||
25. :doc:`ELECTRA <model_doc/electra>` (from Google Research/Stanford University) released with the paper `ELECTRA:
|
||||
Pre-training text encoders as discriminators rather than generators <https://arxiv.org/abs/2003.10555>`__ by Kevin
|
||||
Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
|
||||
26. :doc:`FlauBERT <model_doc/flaubert>` (from CNRS) released with the paper `FlauBERT: Unsupervised Language Model
|
||||
Pre-training for French <https://arxiv.org/abs/1912.05372>`__ by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne,
|
||||
Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
|
||||
27. :doc:`Funnel Transformer <model_doc/funnel>` (from CMU/Google Brain) released with the paper `Funnel-Transformer:
|
||||
Filtering out Sequential Redundancy for Efficient Language Processing <https://arxiv.org/abs/2006.03236>`__ by
|
||||
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||
28. :doc:`GPT <model_doc/gpt>` (from OpenAI) released with the paper `Improving Language Understanding by Generative
|
||||
Pre-Training <https://blog.openai.com/language-unsupervised/>`__ by Alec Radford, Karthik Narasimhan, Tim Salimans
|
||||
and Ilya Sutskever.
|
||||
29. :doc:`GPT-2 <model_doc/gpt2>` (from OpenAI) released with the paper `Language Models are Unsupervised Multitask
|
||||
Learners <https://blog.openai.com/better-language-models/>`__ by Alec Radford*, Jeffrey Wu*, Rewon Child, David
|
||||
Luan, Dario Amodei** and Ilya Sutskever**.
|
||||
30. :doc:`GPT Neo <model_doc/gpt_neo>` (from EleutherAI) released in the repository `EleutherAI/gpt-neo
|
||||
<https://github.com/EleutherAI/gpt-neo>`__ by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
|
||||
31. :doc:`Hubert <model_doc/hubert>` (from Facebook) released with the paper `HuBERT: Self-Supervised Speech
|
||||
Representation Learning by Masked Prediction of Hidden Units <https://arxiv.org/abs/2106.07447>`__ by Wei-Ning Hsu,
|
||||
Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
|
||||
32. :doc:`I-BERT <model_doc/ibert>` (from Berkeley) released with the paper `I-BERT: Integer-only BERT Quantization
|
||||
<https://arxiv.org/abs/2101.01321>`__ by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
|
||||
33. :doc:`LayoutLM <model_doc/layoutlm>` (from Microsoft Research Asia) released with the paper `LayoutLM: Pre-training
|
||||
of Text and Layout for Document Image Understanding <https://arxiv.org/abs/1912.13318>`__ by Yiheng Xu, Minghao Li,
|
||||
Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
|
||||
34. :doc:`LED <model_doc/led>` (from AllenAI) released with the paper `Longformer: The Long-Document Transformer
|
||||
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
35. :doc:`Longformer <model_doc/longformer>` (from AllenAI) released with the paper `Longformer: The Long-Document
|
||||
Transformer <https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, Arman Cohan.
|
||||
36. :doc:`LUKE <model_doc/luke>` (from Studio Ousia) released with the paper `LUKE: Deep Contextualized Entity
|
||||
Representations with Entity-aware Self-attention <https://arxiv.org/abs/2010.01057>`__ by Ikuya Yamada, Akari Asai,
|
||||
Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
|
||||
37. :doc:`LXMERT <model_doc/lxmert>` (from UNC Chapel Hill) released with the paper `LXMERT: Learning Cross-Modality
|
||||
Encoder Representations from Transformers for Open-Domain Question Answering <https://arxiv.org/abs/1908.07490>`__
|
||||
by Hao Tan and Mohit Bansal.
|
||||
38. :doc:`M2M100 <model_doc/m2m_100>` (from Facebook) released with the paper `Beyond English-Centric Multilingual
|
||||
Machine Translation <https://arxiv.org/abs/2010.11125>`__ by by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi
|
||||
Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman
|
||||
Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
|
||||
39. :doc:`MarianMT <model_doc/marian>` Machine translation models trained using `OPUS <http://opus.nlpl.eu/>`__ data by
|
||||
Jörg Tiedemann. The `Marian Framework <https://marian-nmt.github.io/>`__ is being developed by the Microsoft
|
||||
Translator Team.
|
||||
40. :doc:`MBart <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Denoising Pre-training for
|
||||
Neural Machine Translation <https://arxiv.org/abs/2001.08210>`__ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li,
|
||||
Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
41. :doc:`MBart-50 <model_doc/mbart>` (from Facebook) released with the paper `Multilingual Translation with Extensible
|
||||
Multilingual Pretraining and Finetuning <https://arxiv.org/abs/2008.00401>`__ by Yuqing Tang, Chau Tran, Xian Li,
|
||||
Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
|
||||
42. :doc:`Megatron-BERT <model_doc/megatron_bert>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
||||
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
||||
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
43. :doc:`Megatron-GPT2 <model_doc/megatron_gpt2>` (from NVIDIA) released with the paper `Megatron-LM: Training
|
||||
Multi-Billion Parameter Language Models Using Model Parallelism <https://arxiv.org/abs/1909.08053>`__ by Mohammad
|
||||
Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
|
||||
44. :doc:`MPNet <model_doc/mpnet>` (from Microsoft Research) released with the paper `MPNet: Masked and Permuted
|
||||
Pre-training for Language Understanding <https://arxiv.org/abs/2004.09297>`__ by Kaitao Song, Xu Tan, Tao Qin,
|
||||
Jianfeng Lu, Tie-Yan Liu.
|
||||
45. :doc:`MT5 <model_doc/mt5>` (from Google AI) released with the paper `mT5: A massively multilingual pre-trained
|
||||
text-to-text transformer <https://arxiv.org/abs/2010.11934>`__ by Linting Xue, Noah Constant, Adam Roberts, Mihir
|
||||
Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
|
||||
46. :doc:`Pegasus <model_doc/pegasus>` (from Google) released with the paper `PEGASUS: Pre-training with Extracted
|
||||
Gap-sentences for Abstractive Summarization <https://arxiv.org/abs/1912.08777>`__> by Jingqing Zhang, Yao Zhao,
|
||||
Mohammad Saleh and Peter J. Liu.
|
||||
47. :doc:`ProphetNet <model_doc/prophetnet>` (from Microsoft Research) released with the paper `ProphetNet: Predicting
|
||||
Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan, Weizhen Qi,
|
||||
Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
48. :doc:`Reformer <model_doc/reformer>` (from Google Research) released with the paper `Reformer: The Efficient
|
||||
Transformer <https://arxiv.org/abs/2001.04451>`__ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||
49. :doc:`RoBERTa <model_doc/roberta>` (from Facebook), released together with the paper a `Robustly Optimized BERT
|
||||
Pretraining Approach <https://arxiv.org/abs/1907.11692>`__ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar
|
||||
Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
|
||||
50. :doc:`RoFormer <model_doc/roformer>` (from ZhuiyiTechnology), released together with the paper a `RoFormer:
|
||||
Enhanced Transformer with Rotary Position Embedding <https://arxiv.org/pdf/2104.09864v1.pdf>`__ by Jianlin Su and
|
||||
Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
|
||||
51. :doc:`SpeechToTextTransformer <model_doc/speech_to_text>` (from Facebook), released together with the paper
|
||||
`fairseq S2T: Fast Speech-to-Text Modeling with fairseq <https://arxiv.org/abs/2010.05171>`__ by Changhan Wang, Yun
|
||||
Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
|
||||
52. :doc:`SqueezeBert <model_doc/squeezebert>` released with the paper `SqueezeBERT: What can computer vision teach NLP
|
||||
about efficient neural networks? <https://arxiv.org/abs/2006.11316>`__ by Forrest N. Iandola, Albert E. Shaw, Ravi
|
||||
Krishna, and Kurt W. Keutzer.
|
||||
53. :doc:`T5 <model_doc/t5>` (from Google AI) released with the paper `Exploring the Limits of Transfer Learning with a
|
||||
Unified Text-to-Text Transformer <https://arxiv.org/abs/1910.10683>`__ by Colin Raffel and Noam Shazeer and Adam
|
||||
Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
|
||||
54. :doc:`TAPAS <model_doc/tapas>` (from Google AI) released with the paper `TAPAS: Weakly Supervised Table Parsing via
|
||||
Pre-training <https://arxiv.org/abs/2004.02349>`__ by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller,
|
||||
Francesco Piccinno and Julian Martin Eisenschlos.
|
||||
55. :doc:`Transformer-XL <model_doc/transformerxl>` (from Google/CMU) released with the paper `Transformer-XL:
|
||||
Attentive Language Models Beyond a Fixed-Length Context <https://arxiv.org/abs/1901.02860>`__ by Zihang Dai*,
|
||||
Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
|
||||
56. :doc:`Vision Transformer (ViT) <model_doc/vit>` (from Google AI) released with the paper `An Image is Worth 16x16
|
||||
Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`__ by Alexey Dosovitskiy,
|
||||
Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias
|
||||
Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
|
||||
57. :doc:`VisualBERT <model_doc/visual_bert>` (from UCLA NLP) released with the paper `VisualBERT: A Simple and
|
||||
Performant Baseline for Vision and Language <https://arxiv.org/pdf/1908.03557>`__ by Liunian Harold Li, Mark
|
||||
Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
|
||||
58. :doc:`Wav2Vec2 <model_doc/wav2vec2>` (from Facebook AI) released with the paper `wav2vec 2.0: A Framework for
|
||||
Self-Supervised Learning of Speech Representations <https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry
|
||||
Zhou, Abdelrahman Mohamed, Michael Auli.
|
||||
59. :doc:`XLM <model_doc/xlm>` (from Facebook) released together with the paper `Cross-lingual Language Model
|
||||
Pretraining <https://arxiv.org/abs/1901.07291>`__ by Guillaume Lample and Alexis Conneau.
|
||||
60. :doc:`XLM-ProphetNet <model_doc/xlmprophetnet>` (from Microsoft Research) released with the paper `ProphetNet:
|
||||
Predicting Future N-gram for Sequence-to-Sequence Pre-training <https://arxiv.org/abs/2001.04063>`__ by Yu Yan,
|
||||
Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
61. :doc:`XLM-RoBERTa <model_doc/xlmroberta>` (from Facebook AI), released together with the paper `Unsupervised
|
||||
Cross-lingual Representation Learning at Scale <https://arxiv.org/abs/1911.02116>`__ by Alexis Conneau*, Kartikay
|
||||
Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke
|
||||
Zettlemoyer and Veselin Stoyanov.
|
||||
62. :doc:`XLNet <model_doc/xlnet>` (from Google/CMU) released with the paper `XLNet: Generalized Autoregressive
|
||||
Pretraining for Language Understanding <https://arxiv.org/abs/1906.08237>`__ by Zhilin Yang*, Zihang Dai*, Yiming
|
||||
Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
63. :doc:`XLSR-Wav2Vec2 <model_doc/xlsr_wav2vec2>` (from Facebook AI) released with the paper `Unsupervised
|
||||
Cross-Lingual Representation Learning For Speech Recognition <https://arxiv.org/abs/2006.13979>`__ by Alexis
|
||||
Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
|
||||
|
||||
Supported frameworks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The table below represents the current support in the library for each of those models, whether they have a Python
|
||||
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
|
||||
Flax), PyTorch, and/or TensorFlow.
|
||||
|
||||
..
|
||||
This table is updated automatically from the auto modules with `make fix-copies`. Do not update manually!
|
||||
|
||||
.. rst-class:: center-aligned-table
|
||||
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
|
||||
+=============================+================+================+=================+====================+==============+
|
||||
| ALBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BART | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Blenderbot | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BlenderbotSmall | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| CLIP | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DeBERTa | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DeBERTa-v2 | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Encoder decoder | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Pegasus | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| RoFormer | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Speech2Text | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| TAPAS | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| ViT | ❌ | ❌ | ✅ | ❌ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| mT5 | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Get started
|
||||
|
||||
quicktour
|
||||
installation
|
||||
philosophy
|
||||
glossary
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Using 🤗 Transformers
|
||||
|
||||
task_summary
|
||||
model_summary
|
||||
preprocessing
|
||||
training
|
||||
model_sharing
|
||||
tokenizer_summary
|
||||
multilingual
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Advanced guides
|
||||
|
||||
pretrained_models
|
||||
examples
|
||||
troubleshooting
|
||||
custom_datasets
|
||||
notebooks
|
||||
sagemaker
|
||||
community
|
||||
converting_tensorflow_models
|
||||
migration
|
||||
contributing
|
||||
add_new_model
|
||||
fast_tokenizers
|
||||
performance
|
||||
parallelism
|
||||
testing
|
||||
debugging
|
||||
serialization
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Research
|
||||
|
||||
bertology
|
||||
perplexity
|
||||
benchmarks
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Main Classes
|
||||
|
||||
main_classes/callback
|
||||
main_classes/configuration
|
||||
main_classes/data_collator
|
||||
main_classes/logging
|
||||
main_classes/model
|
||||
main_classes/optimizer_schedules
|
||||
main_classes/output
|
||||
main_classes/pipelines
|
||||
main_classes/processors
|
||||
main_classes/tokenizer
|
||||
main_classes/trainer
|
||||
main_classes/deepspeed
|
||||
main_classes/feature_extractor
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Models
|
||||
|
||||
model_doc/albert
|
||||
model_doc/auto
|
||||
model_doc/bart
|
||||
model_doc/barthez
|
||||
model_doc/bert
|
||||
model_doc/bertweet
|
||||
model_doc/bertgeneration
|
||||
model_doc/bert_japanese
|
||||
model_doc/bigbird
|
||||
model_doc/bigbird_pegasus
|
||||
model_doc/blenderbot
|
||||
model_doc/blenderbot_small
|
||||
model_doc/bort
|
||||
model_doc/byt5
|
||||
model_doc/camembert
|
||||
model_doc/canine
|
||||
model_doc/clip
|
||||
model_doc/convbert
|
||||
model_doc/cpm
|
||||
model_doc/ctrl
|
||||
model_doc/deberta
|
||||
model_doc/deberta_v2
|
||||
model_doc/deit
|
||||
model_doc/detr
|
||||
model_doc/dialogpt
|
||||
model_doc/distilbert
|
||||
model_doc/dpr
|
||||
model_doc/electra
|
||||
model_doc/encoderdecoder
|
||||
model_doc/flaubert
|
||||
model_doc/fsmt
|
||||
model_doc/funnel
|
||||
model_doc/herbert
|
||||
model_doc/ibert
|
||||
model_doc/layoutlm
|
||||
model_doc/led
|
||||
model_doc/longformer
|
||||
model_doc/luke
|
||||
model_doc/lxmert
|
||||
model_doc/marian
|
||||
model_doc/m2m_100
|
||||
model_doc/mbart
|
||||
model_doc/megatron_bert
|
||||
model_doc/megatron_gpt2
|
||||
model_doc/mobilebert
|
||||
model_doc/mpnet
|
||||
model_doc/mt5
|
||||
model_doc/gpt
|
||||
model_doc/gpt2
|
||||
model_doc/gpt_neo
|
||||
model_doc/hubert
|
||||
model_doc/pegasus
|
||||
model_doc/phobert
|
||||
model_doc/prophetnet
|
||||
model_doc/rag
|
||||
model_doc/reformer
|
||||
model_doc/retribert
|
||||
model_doc/roberta
|
||||
model_doc/roformer
|
||||
model_doc/speech_to_text
|
||||
model_doc/squeezebert
|
||||
model_doc/t5
|
||||
model_doc/tapas
|
||||
model_doc/transformerxl
|
||||
model_doc/vit
|
||||
model_doc/visual_bert
|
||||
model_doc/wav2vec2
|
||||
model_doc/xlm
|
||||
model_doc/xlmprophetnet
|
||||
model_doc/xlmroberta
|
||||
model_doc/xlnet
|
||||
model_doc/xlsr_wav2vec2
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Internal Helpers
|
||||
|
||||
internal/modeling_utils
|
||||
internal/pipelines_utils
|
||||
internal/tokenization_utils
|
||||
internal/trainer_utils
|
||||
internal/generation_utils
|
||||
internal/file_utils
|
||||
@ -79,9 +79,9 @@ Here is how to quickly install `transformers` from source:
|
||||
pip install git+https://github.com/huggingface/transformers
|
||||
```
|
||||
|
||||
Note that this will install not the latest released version, but the bleeding edge `master` version, which you may want to use in case a bug has been fixed since the last official release and a new release hasn't been yet rolled out.
|
||||
Note that this will install not the latest released version, but the bleeding edge `master` version, which you may want to use in case a bug has been fixed since the last official release and a new release hasn't been yet rolled out.
|
||||
|
||||
While we strive to keep `master` operational at all times, if you notice some issues, they usually get fixed within a few hours or a day and and you're more than welcome to help us detect any problems by opening an [Issue](https://github.com/huggingface/transformers/issues) and this way, things will get fixed even sooner.
|
||||
While we strive to keep `master` operational at all times, if you notice some issues, they usually get fixed within a few hours or a day and you're more than welcome to help us detect any problems by opening an [Issue](https://github.com/huggingface/transformers/issues) and this way, things will get fixed even sooner.
|
||||
|
||||
Again, you can run:
|
||||
|
||||
|
||||
@ -63,7 +63,6 @@ TensorFlow custom layers
|
||||
:members: call
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
|
||||
:members: call
|
||||
|
||||
|
||||
TensorFlow loss functions
|
||||
|
||||
@ -17,6 +17,11 @@ The base class :class:`~transformers.PretrainedConfig` implements the common met
|
||||
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
|
||||
from HuggingFace's AWS S3 repository).
|
||||
|
||||
Each derived config class implements model specific attributes. Common attributes present in all config classes are:
|
||||
:obj:`hidden_size`, :obj:`num_attention_heads`, and :obj:`num_hidden_layers`. Text models further implement:
|
||||
:obj:`vocab_size`.
|
||||
|
||||
|
||||
|
||||
PretrainedConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -18,7 +18,7 @@ the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
|
||||
|
||||
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
|
||||
:class:`~transformers.DataCollatorForLanguageModeling`) also apply some random data augmentation (like random masking)
|
||||
oin the formed batch.
|
||||
on the formed batch.
|
||||
|
||||
Examples of use can be found in the :doc:`example scripts <../examples>` or :doc:`example notebooks <../notebooks>`.
|
||||
|
||||
@ -29,6 +29,13 @@ Default data collator
|
||||
.. autofunction:: transformers.data.data_collator.default_data_collator
|
||||
|
||||
|
||||
DefaultDataCollator
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.data.data_collator.DefaultDataCollator
|
||||
:members:
|
||||
|
||||
|
||||
DataCollatorWithPadding
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -54,18 +61,18 @@ DataCollatorForLanguageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
|
||||
:members: mask_tokens
|
||||
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens
|
||||
|
||||
|
||||
DataCollatorForWholeWordMask
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
|
||||
:members: mask_tokens
|
||||
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens
|
||||
|
||||
|
||||
DataCollatorForPermutationLanguageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
|
||||
:members: mask_tokens
|
||||
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens
|
||||
|
||||
@ -46,6 +46,20 @@ won't be possible on a single GPU.
|
||||
parts of DeepSpeed like ``zero.Init`` for ZeRO stage 3 and higher. To tap into this feature read the docs on
|
||||
:ref:`deepspeed-non-trainer-integration`.
|
||||
|
||||
What is integrated:
|
||||
|
||||
Training:
|
||||
|
||||
1. DeepSpeed ZeRO training supports the full ZeRO stages 1, 2 and 3 with ZeRO-Infinity (CPU and NVME offload).
|
||||
|
||||
Inference:
|
||||
|
||||
1. DeepSpeed ZeRO Inference supports ZeRO stage 3 with ZeRO-Infinity. It uses the same ZeRO protocol as training, but
|
||||
it doesn't use an optimizer and a lr scheduler and only stage 3 is relevant. For more details see:
|
||||
:ref:`deepspeed-zero-inference`.
|
||||
|
||||
There is also DeepSpeed Inference - this is a totally different technology which uses Tensor Parallelism instead of
|
||||
ZeRO (coming soon).
|
||||
|
||||
|
||||
|
||||
@ -1628,6 +1642,47 @@ larger multi-dimensional shape, this means that the parameter is partitioned and
|
||||
|
||||
|
||||
|
||||
.. _deepspeed-zero-inference:
|
||||
|
||||
|
||||
ZeRO Inference
|
||||
=======================================================================================================================
|
||||
|
||||
ZeRO Inference uses the same config as ZeRO-3 Training. You just don't need the optimizer and scheduler sections. In
|
||||
fact you can leave these in the config file if you want to share the same one with the training. They will just be
|
||||
ignored.
|
||||
|
||||
Otherwise you just need to pass the usual :class:`~transformers.TrainingArguments` arguments. For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
deepspeed --num_gpus=2 your_program.py <normal cl args> --do_eval --deepspeed ds_config.json
|
||||
|
||||
The only important thing is that you need to use a ZeRO-3 configuration, since ZeRO-2 provides no benefit whatsoever
|
||||
for the inference as only ZeRO-3 performs sharding of parameters, whereas ZeRO-1 shards gradients and optimizer states.
|
||||
|
||||
Here is an example of running ``run_translation.py`` under DeepSpeed deploying all available GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
deepspeed examples/pytorch/translation/run_translation.py \
|
||||
--deepspeed tests/deepspeed/ds_config_zero3.json \
|
||||
--model_name_or_path t5-small --output_dir output_dir \
|
||||
--do_eval --max_eval_samples 50 --warmup_steps 50 \
|
||||
--max_source_length 128 --val_max_target_length 128 \
|
||||
--overwrite_output_dir --per_device_eval_batch_size 4 \
|
||||
--predict_with_generate --dataset_config "ro-en" --fp16 \
|
||||
--source_lang en --target_lang ro --dataset_name wmt16 \
|
||||
--source_prefix "translate English to Romanian: "
|
||||
|
||||
Since for inference there is no need for additional large memory used by the optimizer states and the gradients you
|
||||
should be able to fit much larger batches and/or sequence length onto the same hardware.
|
||||
|
||||
|
||||
Additionally DeepSpeed is currently developing a related product called Deepspeed-Inference which has no relationship
|
||||
to the ZeRO technology, but instead uses tensor parallelism to scale models that can't fit onto a single GPU. This is a
|
||||
work in progress and we will provide the integration once that product is complete.
|
||||
|
||||
|
||||
Filing Issues
|
||||
=======================================================================================================================
|
||||
@ -1728,7 +1783,7 @@ For example for a pretrained model:
|
||||
.. code-block:: python
|
||||
|
||||
from transformers.deepspeed import HfDeepSpeedConfig
|
||||
from transformers import AugoModel
|
||||
from transformers import AutoModel, deepspeed
|
||||
|
||||
ds_config = { ... } # deepspeed config object or path to the file
|
||||
# must run before instantiating the model
|
||||
@ -1741,7 +1796,7 @@ or for non-pretrained model:
|
||||
.. code-block:: python
|
||||
|
||||
from transformers.deepspeed import HfDeepSpeedConfig
|
||||
from transformers import AugoModel, AutoConfig
|
||||
from transformers import AutoModel, AutoConfig, deepspeed
|
||||
|
||||
ds_config = { ... } # deepspeed config object or path to the file
|
||||
# must run before instantiating the model
|
||||
|
||||
22
docs/source/main_classes/keras_callbacks.rst
Normal file
22
docs/source/main_classes/keras_callbacks.rst
Normal file
@ -0,0 +1,22 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Keras callbacks
|
||||
=======================================================================================================================
|
||||
|
||||
When training a Transformers model with Keras, there are some library-specific callbacks available to automate common
|
||||
tasks:
|
||||
|
||||
PushToHubCallback
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
.. autoclass:: transformers.keras_callbacks.PushToHubCallback
|
||||
@ -210,6 +210,13 @@ TFBaseModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
|
||||
TFBaseModelOutputWithPoolingAndCrossAttentions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPoolingAndCrossAttentions
|
||||
:members:
|
||||
|
||||
|
||||
TFBaseModelOutputWithPast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -217,6 +224,13 @@ TFBaseModelOutputWithPast
|
||||
:members:
|
||||
|
||||
|
||||
TFBaseModelOutputWithPastAndCrossAttentions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_outputs.TFBaseModelOutputWithPastAndCrossAttentions
|
||||
:members:
|
||||
|
||||
|
||||
TFSeq2SeqModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -231,6 +245,13 @@ TFCausalLMOutput
|
||||
:members:
|
||||
|
||||
|
||||
TFCausalLMOutputWithCrossAttentions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_outputs.TFCausalLMOutputWithCrossAttentions
|
||||
:members:
|
||||
|
||||
|
||||
TFCausalLMOutputWithPast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -299,3 +320,93 @@ TFSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
.. autoclass:: transformers.modeling_tf_outputs.TFSeq2SeqQuestionAnsweringModelOutput
|
||||
:members:
|
||||
|
||||
|
||||
FlaxBaseModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxBaseModelOutput
|
||||
|
||||
|
||||
FlaxBaseModelOutputWithPast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPast
|
||||
|
||||
|
||||
FlaxBaseModelOutputWithPooling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPooling
|
||||
|
||||
|
||||
FlaxBaseModelOutputWithPastAndCrossAttentions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxBaseModelOutputWithPastAndCrossAttentions
|
||||
|
||||
|
||||
FlaxSeq2SeqModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxSeq2SeqModelOutput
|
||||
|
||||
|
||||
FlaxCausalLMOutputWithCrossAttentions
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxCausalLMOutputWithCrossAttentions
|
||||
|
||||
|
||||
FlaxMaskedLMOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxMaskedLMOutput
|
||||
|
||||
|
||||
FlaxSeq2SeqLMOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxSeq2SeqLMOutput
|
||||
|
||||
|
||||
FlaxNextSentencePredictorOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxNextSentencePredictorOutput
|
||||
|
||||
|
||||
FlaxSequenceClassifierOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxSequenceClassifierOutput
|
||||
|
||||
|
||||
FlaxSeq2SeqSequenceClassifierOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxSeq2SeqSequenceClassifierOutput
|
||||
|
||||
|
||||
FlaxMultipleChoiceModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxMultipleChoiceModelOutput
|
||||
|
||||
|
||||
FlaxTokenClassifierOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxTokenClassifierOutput
|
||||
|
||||
|
||||
FlaxQuestionAnsweringModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxQuestionAnsweringModelOutput
|
||||
|
||||
|
||||
FlaxSeq2SeqQuestionAnsweringModelOutput
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.modeling_flax_outputs.FlaxSeq2SeqQuestionAnsweringModelOutput
|
||||
|
||||
@ -23,33 +23,262 @@ There are two categories of pipeline abstractions to be aware about:
|
||||
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
|
||||
- The other task-specific pipelines:
|
||||
|
||||
- :class:`~transformers.AudioClassificationPipeline`
|
||||
- :class:`~transformers.AutomaticSpeechRecognitionPipeline`
|
||||
- :class:`~transformers.ConversationalPipeline`
|
||||
- :class:`~transformers.FeatureExtractionPipeline`
|
||||
- :class:`~transformers.FillMaskPipeline`
|
||||
- :class:`~transformers.ImageClassificationPipeline`
|
||||
- :class:`~transformers.ImageSegmentationPipeline`
|
||||
- :class:`~transformers.ObjectDetectionPipeline`
|
||||
- :class:`~transformers.QuestionAnsweringPipeline`
|
||||
- :class:`~transformers.SummarizationPipeline`
|
||||
- :class:`~transformers.TableQuestionAnsweringPipeline`
|
||||
- :class:`~transformers.TextClassificationPipeline`
|
||||
- :class:`~transformers.TextGenerationPipeline`
|
||||
- :class:`~transformers.Text2TextGenerationPipeline`
|
||||
- :class:`~transformers.TokenClassificationPipeline`
|
||||
- :class:`~transformers.TranslationPipeline`
|
||||
- :class:`~transformers.ZeroShotClassificationPipeline`
|
||||
- :class:`~transformers.Text2TextGenerationPipeline`
|
||||
- :class:`~transformers.TableQuestionAnsweringPipeline`
|
||||
|
||||
The pipeline abstraction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
|
||||
pipeline but requires an additional argument which is the `task`.
|
||||
pipeline but can provide additional quality of life.
|
||||
|
||||
Simple call on one item:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe("This restaurant is awesome")
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
|
||||
If you want to use a specific model from the `hub <https://huggingface.co>`__ you can ignore the task if the model on
|
||||
the hub already defines it:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline(model="roberta-large-mnli")
|
||||
>>> pipe("This restaurant is awesome")
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
|
||||
To call a pipeline on many items, you can either call with a `list`.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> pipe = pipeline("text-classification")
|
||||
>>> pipe(["This restaurant is awesome", "This restaurant is aweful"])
|
||||
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
|
||||
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
|
||||
|
||||
|
||||
To iterate of full datasets it is recommended to use a :obj:`dataset` directly. This means you don't need to allocate
|
||||
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
|
||||
GPU. If it doesn't don't hesitate to create an issue.
|
||||
|
||||
.. code-block::
|
||||
|
||||
import datasets
|
||||
from transformers import pipeline
|
||||
from transformers.pipelines.base import KeyDataset
|
||||
import tqdm
|
||||
|
||||
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
|
||||
dataset = datasets.load_dataset("superb", name="asr", split="test")
|
||||
|
||||
# KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
|
||||
# as we're not interested in the `target` part of the dataset.
|
||||
for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
|
||||
print(out)
|
||||
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
||||
# {"text": ....}
|
||||
# ....
|
||||
|
||||
|
||||
.. autofunction:: transformers.pipeline
|
||||
|
||||
Pipeline batching
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
All pipelines (except `zero-shot-classification` and `question-answering` currently) can use batching. This will work
|
||||
whenever the pipeline uses its streaming ability (so when passing lists or :obj:`Dataset`).
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import pipeline
|
||||
from transformers.pipelines.base import KeyDataset
|
||||
import datasets
|
||||
import tqdm
|
||||
|
||||
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
|
||||
pipe = pipeline("text-classification", device=0)
|
||||
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
|
||||
print(out)
|
||||
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
||||
# Exactly the same output as before, but the content are passed
|
||||
# as batches to the model
|
||||
|
||||
|
||||
.. warning::
|
||||
|
||||
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
|
||||
on hardware, data and the actual model being used.
|
||||
|
||||
Example where it's most a speedup:
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import pipeline
|
||||
from torch.utils.data import Dataset
|
||||
import tqdm
|
||||
|
||||
|
||||
pipe = pipeline("text-classification", device=0)
|
||||
|
||||
|
||||
class MyDataset(Dataset):
|
||||
def __len__(self):
|
||||
return 5000
|
||||
|
||||
def __getitem__(self, i):
|
||||
return "This is a test"
|
||||
|
||||
|
||||
dataset = MyDataset()
|
||||
|
||||
for batch_size in [1, 8, 64, 256]:
|
||||
print("-" * 30)
|
||||
print(f"Streaming batch_size={batch_size}")
|
||||
for out in tqdm.tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
|
||||
pass
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
# On GTX 970
|
||||
------------------------------
|
||||
Streaming no batching
|
||||
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=8
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=64
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=256
|
||||
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s]
|
||||
(diminishing returns, saturated the GPU)
|
||||
|
||||
|
||||
Example where it's most a slowdown:
|
||||
|
||||
.. code-block::
|
||||
|
||||
class MyDataset(Dataset):
|
||||
def __len__(self):
|
||||
return 5000
|
||||
|
||||
def __getitem__(self, i):
|
||||
if i % 64 == 0:
|
||||
n = 100
|
||||
else:
|
||||
n = 1
|
||||
return "This is a test" * n
|
||||
|
||||
This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400
|
||||
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
|
||||
bigger batches, the program simply crashes.
|
||||
|
||||
|
||||
.. code-block::
|
||||
|
||||
------------------------------
|
||||
Streaming no batching
|
||||
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=8
|
||||
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=64
|
||||
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]
|
||||
------------------------------
|
||||
Streaming batch_size=256
|
||||
0%| | 0/1000 [00:00<?, ?it/s]
|
||||
Traceback (most recent call last):
|
||||
File "/home/nicolas/src/transformers/test.py", line 42, in <module>
|
||||
for out in tqdm.tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
|
||||
....
|
||||
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
|
||||
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
|
||||
|
||||
|
||||
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of
|
||||
thumb:
|
||||
|
||||
For users, a rule of thumb is:
|
||||
|
||||
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
|
||||
only way to go.**
|
||||
- If you are latency constrained (live product doing inference), don't batch
|
||||
- If you are using CPU, don't batch.
|
||||
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:
|
||||
|
||||
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and
|
||||
try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't
|
||||
control the sequence_length.)
|
||||
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push
|
||||
it until you get OOMs.
|
||||
- The larger the GPU the more likely batching is going to be more interesting
|
||||
- As soon as you enable batching, make sure you can handle OOMs nicely.
|
||||
|
||||
Pipeline custom code
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you want to override a specific pipeline.
|
||||
|
||||
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
|
||||
cases, so :obj:`transformers` could maybe support your use case.
|
||||
|
||||
|
||||
If you want to try simply you can:
|
||||
|
||||
- Subclass your pipeline of choice
|
||||
|
||||
.. code-block::
|
||||
|
||||
class MyPipeline(TextClassificationPipeline):
|
||||
def postprocess(...):
|
||||
...
|
||||
scores = scores * 100
|
||||
...
|
||||
|
||||
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
|
||||
# or if you use `pipeline` function, then:
|
||||
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
|
||||
|
||||
That should enable you to do all the custom code you want.
|
||||
|
||||
|
||||
Implementing a pipeline
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:doc:`Implementing a new pipeline <../add_new_pipeline>`
|
||||
|
||||
The task specific pipelines
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
AudioClassificationPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
.. autoclass:: transformers.AudioClassificationPipeline
|
||||
:special-members: __call__
|
||||
:members:
|
||||
|
||||
AutomaticSpeechRecognitionPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
@ -87,6 +316,13 @@ ImageClassificationPipeline
|
||||
:special-members: __call__
|
||||
:members:
|
||||
|
||||
ImageSegmentationPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
.. autoclass:: transformers.ImageSegmentationPipeline
|
||||
:special-members: __call__
|
||||
:members:
|
||||
|
||||
NerPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
@ -94,6 +330,13 @@ NerPipeline
|
||||
|
||||
See :class:`~transformers.TokenClassificationPipeline` for all details.
|
||||
|
||||
ObjectDetectionPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
.. autoclass:: transformers.ObjectDetectionPipeline
|
||||
:special-members: __call__
|
||||
:members:
|
||||
|
||||
QuestionAnsweringPipeline
|
||||
=======================================================================================================================
|
||||
|
||||
|
||||
@ -20,7 +20,7 @@ Rust library `tokenizers <https://github.com/huggingface/tokenizers>`__. The "Fa
|
||||
1. a significant speed-up in particular when doing batched tokenization and
|
||||
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
|
||||
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
|
||||
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLMRoBERTa
|
||||
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
|
||||
and XLNet models).
|
||||
|
||||
The base classes :class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast`
|
||||
@ -39,7 +39,8 @@ methods for using all the tokenizers:
|
||||
- Managing special tokens (like mask, beginning-of-sentence, etc.): adding them, assigning them to attributes in the
|
||||
tokenizer for easy access and making sure they are not split during tokenization.
|
||||
|
||||
:class:`~transformers.BatchEncoding` holds the output of the tokenizer's encoding methods (``__call__``,
|
||||
:class:`~transformers.BatchEncoding` holds the output of the
|
||||
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase`'s encoding methods (``__call__``,
|
||||
``encode_plus`` and ``batch_encode_plus``) and is derived from a Python dictionary. When the tokenizer is a pure python
|
||||
tokenizer, this class behaves just like a standard python dictionary and holds the various model inputs computed by
|
||||
these methods (``input_ids``, ``attention_mask``...). When the tokenizer is a "Fast" tokenizer (i.e., backed by
|
||||
|
||||
550
docs/source/main_classes/trainer.mdx
Normal file
550
docs/source/main_classes/trainer.mdx
Normal file
@ -0,0 +1,550 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Trainer
|
||||
|
||||
The [`Trainer`] class provides an API for feature-complete training in PyTorch for most standard use cases. It's used in most of the [example scripts](../examples).
|
||||
|
||||
Before instantiating your [`Trainer`], create a [`TrainingArguments`] to access all the points of customization during training.
|
||||
|
||||
The API supports distributed training on multiple GPUs/TPUs, mixed precision through [NVIDIA Apex](https://github.com/NVIDIA/apex) and Native AMP for PyTorch.
|
||||
|
||||
The [`Trainer`] contains the basic training loop which supports the above features. To inject custom behavior you can subclass them and override the following methods:
|
||||
|
||||
- **get_train_dataloader** -- Creates the training DataLoader.
|
||||
- **get_eval_dataloader** -- Creates the evaluation DataLoader.
|
||||
- **get_test_dataloader** -- Creates the test DataLoader.
|
||||
- **log** -- Logs information on the various objects watching training.
|
||||
- **create_optimizer_and_scheduler** -- Sets up the optimizer and learning rate scheduler if they were not passed at
|
||||
init. Note, that you can also subclass or override the `create_optimizer` and `create_scheduler` methods
|
||||
separately.
|
||||
- **create_optimizer** -- Sets up the optimizer if it wasn't passed at init.
|
||||
- **create_scheduler** -- Sets up the learning rate scheduler if it wasn't passed at init.
|
||||
- **compute_loss** - Computes the loss on a batch of training inputs.
|
||||
- **training_step** -- Performs a training step.
|
||||
- **prediction_step** -- Performs an evaluation/test step.
|
||||
- **evaluate** -- Runs an evaluation loop and returns metrics.
|
||||
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
The [`Trainer`] class is optimized for 🤗 Transformers models and can have surprising behaviors
|
||||
when you use it on other models. When using it on your own model, make sure:
|
||||
|
||||
- your model always return tuples or subclasses of [`~file_utils.ModelOutput`].
|
||||
- your model can compute the loss if a `labels` argument is provided and that loss is returned as the first
|
||||
element of the tuple (if your model returns tuples)
|
||||
- your model can accept multiple label arguments (use the `label_names` in your [`TrainingArguments`] to indicate their name to the [`Trainer`]) but none of them should be named `"label"`.
|
||||
|
||||
</Tip>
|
||||
|
||||
Here is an example of how to customize [`Trainer`] using a custom loss function for multi-label classification:
|
||||
|
||||
```python
|
||||
from torch import nn
|
||||
from transformers import Trainer
|
||||
|
||||
class MultilabelTrainer(Trainer):
|
||||
def compute_loss(self, model, inputs, return_outputs=False):
|
||||
labels = inputs.get("labels")
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.get('logits')
|
||||
loss_fct = nn.BCEWithLogitsLoss()
|
||||
loss = loss_fct(logits.view(-1, self.model.config.num_labels),
|
||||
labels.float().view(-1, self.model.config.num_labels))
|
||||
return (loss, outputs) if return_outputs else loss
|
||||
```
|
||||
|
||||
Another way to customize the training loop behavior for the PyTorch [`Trainer`] is to use [callbacks](callback) that can inspect the training loop state (for progress reporting, logging on TensorBoard or other ML platforms...) and take decisions (like early stopping).
|
||||
|
||||
|
||||
## Trainer
|
||||
|
||||
[[autodoc]] Trainer
|
||||
- all
|
||||
|
||||
## Seq2SeqTrainer
|
||||
|
||||
[[autodoc]] Seq2SeqTrainer
|
||||
- evaluate
|
||||
- predict
|
||||
|
||||
## TrainingArguments
|
||||
|
||||
[[autodoc]] TrainingArguments
|
||||
- all
|
||||
|
||||
## Seq2SeqTrainingArguments
|
||||
|
||||
[[autodoc]] Seq2SeqTrainingArguments
|
||||
- all
|
||||
|
||||
## Checkpoints
|
||||
|
||||
By default, [`Trainer`] will save all checkpoints in the `output_dir` you set in the
|
||||
[`TrainingArguments`] you are using. Those will go in subfolder named `checkpoint-xxx` with xxx
|
||||
being the step at which the training was at.
|
||||
|
||||
Resuming training from a checkpoint can be done when calling [`Trainer.train`] with either:
|
||||
|
||||
- `resume_from_checkpoint=True` which will resume training from the latest checkpoint
|
||||
- `resume_from_checkpoint=checkpoint_dir` which will resume training from the specific checkpoint in the directory
|
||||
passed.
|
||||
|
||||
In addition, you can easily save your checkpoints on the Model Hub when using `push_to_hub=True`. By default, all
|
||||
the models saved in intermediate checkpoints are saved in different commits, but not the optimizer state. You can adapt
|
||||
the `hub-strategy` value of your [`TrainingArguments`] to either:
|
||||
|
||||
- `"checkpoint"`: the latest checkpoint is also pushed in a subfolder named last-checkpoint, allowing you to
|
||||
resume training easily with `trainer.train(resume_from_checkpoint="output_dir/last-checkpoint")`.
|
||||
- `"all_checkpoints"`: all checkpoints are pushed like they appear in the output folder (so you will get one
|
||||
checkpoint folder per folder in your final repository)
|
||||
|
||||
|
||||
## Logging
|
||||
|
||||
By default [`Trainer`] will use `logging.INFO` for the main process and `logging.WARNING` for the replicas if any.
|
||||
|
||||
These defaults can be overridden to use any of the 5 `logging` levels with [`TrainingArguments`]'s
|
||||
arguments:
|
||||
|
||||
- `log_level` - for the main process
|
||||
- `log_level_replica` - for the replicas
|
||||
|
||||
Further, if [`TrainingArguments`]'s `log_on_each_node` is set to `False` only the main node will
|
||||
use the log level settings for its main process, all other nodes will use the log level settings for replicas.
|
||||
|
||||
Note that [`Trainer`] is going to set `transformers`'s log level separately for each node in its
|
||||
[`Trainer.__init__`]. So you may want to set this sooner (see the next example) if you tap into other
|
||||
`transformers` functionality before creating the [`Trainer`] object.
|
||||
|
||||
Here is an example of how this can be used in an application:
|
||||
|
||||
```python
|
||||
[...]
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Setup logging
|
||||
logging.basicConfig(
|
||||
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
|
||||
datefmt="%m/%d/%Y %H:%M:%S",
|
||||
handlers=[logging.StreamHandler(sys.stdout)],
|
||||
)
|
||||
|
||||
# set the main code and the modules it uses to the same log-level according to the node
|
||||
log_level = training_args.get_process_log_level()
|
||||
logger.setLevel(log_level)
|
||||
datasets.utils.logging.set_verbosity(log_level)
|
||||
transformers.utils.logging.set_verbosity(log_level)
|
||||
|
||||
trainer = Trainer(...)
|
||||
```
|
||||
|
||||
And then if you only want to see warnings on the main node and all other nodes to not print any most likely duplicated
|
||||
warnings you could run it as:
|
||||
|
||||
```bash
|
||||
my_app.py ... --log_level warning --log_level_replica error
|
||||
```
|
||||
|
||||
In the multi-node environment if you also don't want the logs to repeat for each node's main process, you will want to
|
||||
change the above to:
|
||||
|
||||
```bash
|
||||
my_app.py ... --log_level warning --log_level_replica error --log_on_each_node 0
|
||||
```
|
||||
|
||||
and then only the main process of the first node will log at the "warning" level, and all other processes on the main
|
||||
node and all processes on other nodes will log at the "error" level.
|
||||
|
||||
If you need your application to be as quiet as possible you could do:
|
||||
|
||||
```bash
|
||||
my_app.py ... --log_level error --log_level_replica error --log_on_each_node 0
|
||||
```
|
||||
|
||||
(add `--log_on_each_node 0` if on multi-node environment)
|
||||
|
||||
|
||||
## Randomness
|
||||
|
||||
When resuming from a checkpoint generated by [`Trainer`] all efforts are made to restore the
|
||||
_python_, _numpy_ and _pytorch_ RNG states to the same states as they were at the moment of saving that checkpoint,
|
||||
which should make the "stop and resume" style of training as close as possible to non-stop training.
|
||||
|
||||
However, due to various default non-deterministic pytorch settings this might not fully work. If you want full
|
||||
determinism please refer to [Controlling sources of randomness](https://pytorch.org/docs/stable/notes/randomness). As explained in the document, that some of those settings
|
||||
that make things deterministic (.e.g., `torch.backends.cudnn.deterministic`) may slow things down, therefore this
|
||||
can't be done by default, but you can enable those yourself if needed.
|
||||
|
||||
|
||||
## Trainer Integrations
|
||||
|
||||
The [`Trainer`] has been extended to support libraries that may dramatically improve your training
|
||||
time and fit much bigger models.
|
||||
|
||||
Currently it supports third party solutions, [DeepSpeed](https://github.com/microsoft/DeepSpeed) and [FairScale](https://github.com/facebookresearch/fairscale/), which implement parts of the paper [ZeRO: Memory Optimizations
|
||||
Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He](https://arxiv.org/abs/1910.02054).
|
||||
|
||||
This provided support is new and experimental as of this writing.
|
||||
|
||||
<a id='zero-install-notes'></a>
|
||||
|
||||
### CUDA Extension Installation Notes
|
||||
|
||||
As of this writing, both FairScale and Deepspeed require compilation of CUDA C++ code, before they can be used.
|
||||
|
||||
While all installation issues should be dealt with through the corresponding GitHub Issues of [FairScale](https://github.com/facebookresearch/fairscale/issues) and [Deepspeed](https://github.com/microsoft/DeepSpeed/issues), there are a few common issues that one may encounter while building
|
||||
any PyTorch extension that needs to build CUDA extensions.
|
||||
|
||||
Therefore, if you encounter a CUDA-related build issue while doing one of the following or both:
|
||||
|
||||
```bash
|
||||
pip install fairscale
|
||||
pip install deepspeed
|
||||
```
|
||||
|
||||
please, read the following notes first.
|
||||
|
||||
In these notes we give examples for what to do when `pytorch` has been built with CUDA `10.2`. If your situation is
|
||||
different remember to adjust the version number to the one you are after.
|
||||
|
||||
#### Possible problem #1
|
||||
|
||||
While, Pytorch comes with its own CUDA toolkit, to build these two projects you must have an identical version of CUDA
|
||||
installed system-wide.
|
||||
|
||||
For example, if you installed `pytorch` with `cudatoolkit==10.2` in the Python environment, you also need to have
|
||||
CUDA `10.2` installed system-wide.
|
||||
|
||||
The exact location may vary from system to system, but `/usr/local/cuda-10.2` is the most common location on many
|
||||
Unix systems. When CUDA is correctly set up and added to the `PATH` environment variable, one can find the
|
||||
installation location by doing:
|
||||
|
||||
```bash
|
||||
which nvcc
|
||||
```
|
||||
|
||||
If you don't have CUDA installed system-wide, install it first. You will find the instructions by using your favorite
|
||||
search engine. For example, if you're on Ubuntu you may want to search for: [ubuntu cuda 10.2 install](https://www.google.com/search?q=ubuntu+cuda+10.2+install).
|
||||
|
||||
#### Possible problem #2
|
||||
|
||||
Another possible common problem is that you may have more than one CUDA toolkit installed system-wide. For example you
|
||||
may have:
|
||||
|
||||
```bash
|
||||
/usr/local/cuda-10.2
|
||||
/usr/local/cuda-11.0
|
||||
```
|
||||
|
||||
Now, in this situation you need to make sure that your `PATH` and `LD_LIBRARY_PATH` environment variables contain
|
||||
the correct paths to the desired CUDA version. Typically, package installers will set these to contain whatever the
|
||||
last version was installed. If you encounter the problem, where the package build fails because it can't find the right
|
||||
CUDA version despite you having it installed system-wide, it means that you need to adjust the 2 aforementioned
|
||||
environment variables.
|
||||
|
||||
First, you may look at their contents:
|
||||
|
||||
```bash
|
||||
echo $PATH
|
||||
echo $LD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
so you get an idea of what is inside.
|
||||
|
||||
It's possible that `LD_LIBRARY_PATH` is empty.
|
||||
|
||||
`PATH` lists the locations of where executables can be found and `LD_LIBRARY_PATH` is for where shared libraries
|
||||
are to looked for. In both cases, earlier entries have priority over the later ones. `:` is used to separate multiple
|
||||
entries.
|
||||
|
||||
Now, to tell the build program where to find the specific CUDA toolkit, insert the desired paths to be listed first by
|
||||
doing:
|
||||
|
||||
```bash
|
||||
export PATH=/usr/local/cuda-10.2/bin:$PATH
|
||||
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
|
||||
```
|
||||
|
||||
Note that we aren't overwriting the existing values, but prepending instead.
|
||||
|
||||
Of course, adjust the version number, the full path if need be. Check that the directories you assign actually do
|
||||
exist. `lib64` sub-directory is where the various CUDA `.so` objects, like `libcudart.so` reside, it's unlikely
|
||||
that your system will have it named differently, but if it is adjust it to reflect your reality.
|
||||
|
||||
|
||||
#### Possible problem #3
|
||||
|
||||
Some older CUDA versions may refuse to build with newer compilers. For example, you my have `gcc-9` but it wants
|
||||
`gcc-7`.
|
||||
|
||||
There are various ways to go about it.
|
||||
|
||||
If you can install the latest CUDA toolkit it typically should support the newer compiler.
|
||||
|
||||
Alternatively, you could install the lower version of the compiler in addition to the one you already have, or you may
|
||||
already have it but it's not the default one, so the build system can't see it. If you have `gcc-7` installed but the
|
||||
build system complains it can't find it, the following might do the trick:
|
||||
|
||||
```bash
|
||||
sudo ln -s /usr/bin/gcc-7 /usr/local/cuda-10.2/bin/gcc
|
||||
sudo ln -s /usr/bin/g++-7 /usr/local/cuda-10.2/bin/g++
|
||||
```
|
||||
|
||||
Here, we are making a symlink to `gcc-7` from `/usr/local/cuda-10.2/bin/gcc` and since
|
||||
`/usr/local/cuda-10.2/bin/` should be in the `PATH` environment variable (see the previous problem's solution), it
|
||||
should find `gcc-7` (and `g++7`) and then the build will succeed.
|
||||
|
||||
As always make sure to edit the paths in the example to match your situation.
|
||||
|
||||
### FairScale
|
||||
|
||||
By integrating [FairScale](https://github.com/facebookresearch/fairscale/) the [`Trainer`]
|
||||
provides support for the following features from [the ZeRO paper](https://arxiv.org/abs/1910.02054):
|
||||
|
||||
1. Optimizer State Sharding
|
||||
2. Gradient Sharding
|
||||
3. Model Parameters Sharding (new and very experimental)
|
||||
4. CPU offload (new and very experimental)
|
||||
|
||||
You will need at least two GPUs to use this feature.
|
||||
|
||||
|
||||
**Installation**:
|
||||
|
||||
Install the library via pypi:
|
||||
|
||||
```bash
|
||||
pip install fairscale
|
||||
```
|
||||
|
||||
or via `transformers`' `extras`:
|
||||
|
||||
```bash
|
||||
pip install transformers[fairscale]
|
||||
```
|
||||
|
||||
(available starting from `transformers==4.6.0`) or find more details on [the FairScale's GitHub page](https://github.com/facebookresearch/fairscale/#installation).
|
||||
|
||||
If you're still struggling with the build, first make sure to read [CUDA Extension Installation Notes](#zero-install-notes).
|
||||
|
||||
If it's still not resolved the build issue, here are a few more ideas.
|
||||
|
||||
`fairscale` seems to have an issue with the recently introduced by pip build isolation feature. If you have a problem
|
||||
with it, you may want to try one of:
|
||||
|
||||
```bash
|
||||
pip install fairscale --no-build-isolation .
|
||||
```
|
||||
|
||||
or:
|
||||
|
||||
```bash
|
||||
git clone https://github.com/facebookresearch/fairscale/
|
||||
cd fairscale
|
||||
rm -r dist build
|
||||
python setup.py bdist_wheel
|
||||
pip uninstall -y fairscale
|
||||
pip install dist/fairscale-*.whl
|
||||
```
|
||||
|
||||
`fairscale` also has issues with building against pytorch-nightly, so if you use it you may have to try one of:
|
||||
|
||||
```bash
|
||||
pip uninstall -y fairscale; pip install fairscale --pre \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly \
|
||||
--no-cache --no-build-isolation
|
||||
```
|
||||
|
||||
or:
|
||||
|
||||
```bash
|
||||
pip install -v --disable-pip-version-check . \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly --pre
|
||||
```
|
||||
|
||||
Of course, adjust the urls to match the cuda version you use.
|
||||
|
||||
If after trying everything suggested you still encounter build issues, please, proceed with the GitHub Issue of
|
||||
[FairScale](https://github.com/facebookresearch/fairscale/issues).
|
||||
|
||||
|
||||
|
||||
**Usage**:
|
||||
|
||||
To use the first version of Sharded data-parallelism, add `--sharded_ddp simple` to the command line arguments, and
|
||||
make sure you have added the distributed launcher `-m torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for `run_translation.py` with 2 GPUs:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp simple
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- This feature requires distributed training (so multiple GPUs).
|
||||
- It is not implemented for TPUs.
|
||||
- It works with `--fp16` too, to make things even faster.
|
||||
- One of the main benefits of enabling `--sharded_ddp simple` is that it uses a lot less GPU memory, so you should be
|
||||
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
|
||||
significantly shorter training time.
|
||||
|
||||
3. To use the second version of Sharded data-parallelism, add `--sharded_ddp zero_dp_2` or `--sharded_ddp zero_dp_3` to the command line arguments, and make sure you have added the distributed launcher `-m torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for `run_translation.py` with 2 GPUs:
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp zero_dp_2
|
||||
```
|
||||
|
||||
`zero_dp_2` is an optimized version of the simple wrapper, while `zero_dp_3` fully shards model weights,
|
||||
gradients and optimizer states.
|
||||
|
||||
Both are compatible with adding `cpu_offload` to enable ZeRO-offload (activate it like this: `--sharded_ddp "zero_dp_2 cpu_offload"`).
|
||||
|
||||
Notes:
|
||||
|
||||
- This feature requires distributed training (so multiple GPUs).
|
||||
- It is not implemented for TPUs.
|
||||
- It works with `--fp16` too, to make things even faster.
|
||||
- The `cpu_offload` additional option requires `--fp16`.
|
||||
- This is an area of active development, so make sure you have a source install of fairscale to use this feature as
|
||||
some bugs you encounter may have been fixed there already.
|
||||
|
||||
Known caveats:
|
||||
|
||||
- This feature is incompatible with `--predict_with_generate` in the _run_translation.py_ script.
|
||||
- Using `--sharded_ddp zero_dp_3` requires wrapping each layer of the model in the special container
|
||||
`FullyShardedDataParallelism` of fairscale. It should be used with the option `auto_wrap` if you are not
|
||||
doing this yourself: `--sharded_ddp "zero_dp_3 auto_wrap"`.
|
||||
|
||||
|
||||
### DeepSpeed
|
||||
|
||||
|
||||
Moved to [Trainer DeepSpeed integration](deepspeed#trainer-deepspeed-integration).
|
||||
|
||||
|
||||
#### Installation
|
||||
|
||||
Moved to [Installation](deepspeed#deepspeed-installation).
|
||||
|
||||
|
||||
#### Deployment with multiple GPUs
|
||||
|
||||
Moved to [Deployment with multiple GPUs](deepspeed#deepspeed-multi-gpu).
|
||||
|
||||
|
||||
#### Deployment with one GPU
|
||||
|
||||
Moved to [Deployment with one GPU](deepspeed#deepspeed-one-gpu).
|
||||
|
||||
|
||||
#### Deployment in Notebooks
|
||||
|
||||
Moved to [Deployment in Notebooks](deepspeed#deepspeed-notebook).
|
||||
|
||||
|
||||
#### Configuration
|
||||
|
||||
Moved to [Configuration](deepspeed#deepspeed-config).
|
||||
|
||||
|
||||
#### Passing Configuration
|
||||
|
||||
Moved to [Passing Configuration](deepspeed#deepspeed-config-passing).
|
||||
|
||||
|
||||
#### Shared Configuration
|
||||
|
||||
Moved to [Shared Configuration](deepspeed#deepspeed-config-shared).
|
||||
|
||||
#### ZeRO
|
||||
|
||||
Moved to [ZeRO](deepspeed#deepspeed-zero).
|
||||
|
||||
##### ZeRO-2 Config
|
||||
|
||||
Moved to [ZeRO-2 Config](deepspeed#deepspeed-zero2-config).
|
||||
|
||||
##### ZeRO-3 Config
|
||||
|
||||
Moved to [ZeRO-3 Config](deepspeed#deepspeed-zero3-config).
|
||||
|
||||
|
||||
#### NVMe Support
|
||||
|
||||
Moved to [NVMe Support](deepspeed#deepspeed-nvme).
|
||||
|
||||
##### ZeRO-2 vs ZeRO-3 Performance
|
||||
|
||||
Moved to [ZeRO-2 vs ZeRO-3 Performance](deepspeed#deepspeed-zero2-zero3-performance).
|
||||
|
||||
##### ZeRO-2 Example
|
||||
|
||||
Moved to [ZeRO-2 Example](deepspeed#deepspeed-zero2-example).
|
||||
|
||||
##### ZeRO-3 Example
|
||||
|
||||
Moved to [ZeRO-3 Example](deepspeed#deepspeed-zero3-example).
|
||||
|
||||
|
||||
#### Optimizer and Scheduler
|
||||
|
||||
##### Optimizer
|
||||
|
||||
Moved to [Optimizer](deepspeed#deepspeed-optimizer).
|
||||
|
||||
|
||||
##### Scheduler
|
||||
|
||||
Moved to [Scheduler](deepspeed#deepspeed-scheduler).
|
||||
|
||||
#### fp32 Precision
|
||||
|
||||
Moved to [fp32 Precision](deepspeed#deepspeed-fp32).
|
||||
|
||||
#### Automatic Mixed Precision
|
||||
|
||||
Moved to [Automatic Mixed Precision](deepspeed#deepspeed-amp).
|
||||
|
||||
#### Batch Size
|
||||
|
||||
Moved to [Batch Size](deepspeed#deepspeed-bs).
|
||||
|
||||
#### Gradient Accumulation
|
||||
|
||||
Moved to [Gradient Accumulation](deepspeed#deepspeed-grad-acc).
|
||||
|
||||
|
||||
#### Gradient Clipping
|
||||
|
||||
Moved to [Gradient Clipping](deepspeed#deepspeed-grad-clip).
|
||||
|
||||
|
||||
#### Getting The Model Weights Out
|
||||
|
||||
Moved to [Getting The Model Weights Out](deepspeed#deepspeed-weight-extraction).
|
||||
@ -1,609 +0,0 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Trainer
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
The :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` classes provide an API for feature-complete
|
||||
training in most standard use cases. It's used in most of the :doc:`example scripts <../examples>`.
|
||||
|
||||
Before instantiating your :class:`~transformers.Trainer`/:class:`~transformers.TFTrainer`, create a
|
||||
:class:`~transformers.TrainingArguments`/:class:`~transformers.TFTrainingArguments` to access all the points of
|
||||
customization during training.
|
||||
|
||||
The API supports distributed training on multiple GPUs/TPUs, mixed precision through `NVIDIA Apex
|
||||
<https://github.com/NVIDIA/apex>`__ and Native AMP for PyTorch and :obj:`tf.keras.mixed_precision` for TensorFlow.
|
||||
|
||||
Both :class:`~transformers.Trainer` and :class:`~transformers.TFTrainer` contain the basic training loop which supports
|
||||
the above features. To inject custom behavior you can subclass them and override the following methods:
|
||||
|
||||
- **get_train_dataloader**/**get_train_tfdataset** -- Creates the training DataLoader (PyTorch) or TF Dataset.
|
||||
- **get_eval_dataloader**/**get_eval_tfdataset** -- Creates the evaluation DataLoader (PyTorch) or TF Dataset.
|
||||
- **get_test_dataloader**/**get_test_tfdataset** -- Creates the test DataLoader (PyTorch) or TF Dataset.
|
||||
- **log** -- Logs information on the various objects watching training.
|
||||
- **create_optimizer_and_scheduler** -- Sets up the optimizer and learning rate scheduler if they were not passed at
|
||||
init. Note, that you can also subclass or override the ``create_optimizer`` and ``create_scheduler`` methods
|
||||
separately.
|
||||
- **create_optimizer** -- Sets up the optimizer if it wasn't passed at init.
|
||||
- **create_scheduler** -- Sets up the learning rate scheduler if it wasn't passed at init.
|
||||
- **compute_loss** - Computes the loss on a batch of training inputs.
|
||||
- **training_step** -- Performs a training step.
|
||||
- **prediction_step** -- Performs an evaluation/test step.
|
||||
- **run_model** (TensorFlow only) -- Basic pass through the model.
|
||||
- **evaluate** -- Runs an evaluation loop and returns metrics.
|
||||
- **predict** -- Returns predictions (with metrics if labels are available) on a test set.
|
||||
|
||||
.. warning::
|
||||
|
||||
The :class:`~transformers.Trainer` class is optimized for 🤗 Transformers models and can have surprising behaviors
|
||||
when you use it on other models. When using it on your own model, make sure:
|
||||
|
||||
- your model always return tuples or subclasses of :class:`~transformers.file_utils.ModelOutput`.
|
||||
- your model can compute the loss if a :obj:`labels` argument is provided and that loss is returned as the first
|
||||
element of the tuple (if your model returns tuples)
|
||||
- your model can accept multiple label arguments (use the :obj:`label_names` in your
|
||||
:class:`~transformers.TrainingArguments` to indicate their name to the :class:`~transformers.Trainer`) but none
|
||||
of them should be named :obj:`"label"`.
|
||||
|
||||
Here is an example of how to customize :class:`~transformers.Trainer` using a custom loss function for multi-label
|
||||
classification:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from torch import nn
|
||||
from transformers import Trainer
|
||||
|
||||
class MultilabelTrainer(Trainer):
|
||||
def compute_loss(self, model, inputs, return_outputs=False):
|
||||
labels = inputs.pop("labels")
|
||||
outputs = model(**inputs)
|
||||
logits = outputs.logits
|
||||
loss_fct = nn.BCEWithLogitsLoss()
|
||||
loss = loss_fct(logits.view(-1, self.model.config.num_labels),
|
||||
labels.float().view(-1, self.model.config.num_labels))
|
||||
return (loss, outputs) if return_outputs else loss
|
||||
|
||||
Another way to customize the training loop behavior for the PyTorch :class:`~transformers.Trainer` is to use
|
||||
:doc:`callbacks <callback>` that can inspect the training loop state (for progress reporting, logging on TensorBoard or
|
||||
other ML platforms...) and take decisions (like early stopping).
|
||||
|
||||
|
||||
Trainer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.Trainer
|
||||
:members:
|
||||
|
||||
|
||||
Seq2SeqTrainer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.Seq2SeqTrainer
|
||||
:members: evaluate, predict
|
||||
|
||||
|
||||
TFTrainer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFTrainer
|
||||
:members:
|
||||
|
||||
|
||||
TrainingArguments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TrainingArguments
|
||||
:members:
|
||||
|
||||
|
||||
Seq2SeqTrainingArguments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.Seq2SeqTrainingArguments
|
||||
:members:
|
||||
|
||||
|
||||
TFTrainingArguments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFTrainingArguments
|
||||
:members:
|
||||
|
||||
|
||||
Logging
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
By default :class:`~transformers.Trainer` will use ``logging.INFO`` for the main process and ``logging.WARNING`` for
|
||||
the replicas if any.
|
||||
|
||||
These defaults can be overridden to use any of the 5 ``logging`` levels with :class:`~transformers.TrainingArguments`'s
|
||||
arguments:
|
||||
|
||||
- ``log_level`` - for the main process
|
||||
- ``log_level_replica`` - for the replicas
|
||||
|
||||
Further, if :class:`~transformers.TrainingArguments`'s ``log_on_each_node`` is set to ``False`` only the main node will
|
||||
use the log level settings for its main process, all other nodes will use the log level settings for replicas.
|
||||
|
||||
Note that :class:`~transformers.Trainer` is going to set ``transformers``'s log level separately for each node in its
|
||||
:meth:`~transformers.Trainer.__init__`. So you may want to set this sooner (see the next example) if you tap into other
|
||||
``transformers`` functionality before creating the :class:`~transformers.Trainer` object.
|
||||
|
||||
Here is an example of how this can be used in an application:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
[...]
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Setup logging
|
||||
logging.basicConfig(
|
||||
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
|
||||
datefmt="%m/%d/%Y %H:%M:%S",
|
||||
handlers=[logging.StreamHandler(sys.stdout)],
|
||||
)
|
||||
|
||||
# set the main code and the modules it uses to the same log-level according to the node
|
||||
log_level = training_args.get_process_log_level()
|
||||
logger.setLevel(log_level)
|
||||
datasets.utils.logging.set_verbosity(log_level)
|
||||
transformers.utils.logging.set_verbosity(log_level)
|
||||
|
||||
trainer = Trainer(...)
|
||||
|
||||
And then if you only want to see warnings on the main node and all other nodes to not print any most likely duplicated
|
||||
warnings you could run it as:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
my_app.py ... --log_level warning --log_level_replica error
|
||||
|
||||
In the multi-node environment if you also don't want the logs to repeat for each node's main process, you will want to
|
||||
change the above to:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
my_app.py ... --log_level warning --log_level_replica error --log_on_each_node 0
|
||||
|
||||
and then only the main process of the first node will log at the "warning" level, and all other processes on the main
|
||||
node and all processes on other nodes will log at the "error" level.
|
||||
|
||||
If you need your application to be as quiet as possible you could do:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
my_app.py ... --log_level error --log_level_replica error --log_on_each_node 0
|
||||
|
||||
(add ``--log_on_each_node 0`` if on multi-node environment)
|
||||
|
||||
|
||||
|
||||
Randomness
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the
|
||||
`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint,
|
||||
which should make the "stop and resume" style of training as close as possible to non-stop training.
|
||||
|
||||
However, due to various default non-deterministic pytorch settings this might not fully work. If you want full
|
||||
determinism please refer to `Controlling sources of randomness
|
||||
<https://pytorch.org/docs/stable/notes/randomness.html>`__. As explained in the document, that some of those settings
|
||||
that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this
|
||||
can't be done by default, but you can enable those yourself if needed.
|
||||
|
||||
|
||||
Trainer Integrations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
|
||||
The :class:`~transformers.Trainer` has been extended to support libraries that may dramatically improve your training
|
||||
time and fit much bigger models.
|
||||
|
||||
Currently it supports third party solutions, `DeepSpeed <https://github.com/microsoft/DeepSpeed>`__ and `FairScale
|
||||
<https://github.com/facebookresearch/fairscale/>`__, which implement parts of the paper `ZeRO: Memory Optimizations
|
||||
Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He
|
||||
<https://arxiv.org/abs/1910.02054>`__.
|
||||
|
||||
This provided support is new and experimental as of this writing.
|
||||
|
||||
.. _zero-install-notes:
|
||||
|
||||
CUDA Extension Installation Notes
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
As of this writing, both FairScale and Deepspeed require compilation of CUDA C++ code, before they can be used.
|
||||
|
||||
While all installation issues should be dealt with through the corresponding GitHub Issues of `FairScale
|
||||
<https://github.com/facebookresearch/fairscale/issues>`__ and `Deepspeed
|
||||
<https://github.com/microsoft/DeepSpeed/issues>`__, there are a few common issues that one may encounter while building
|
||||
any PyTorch extension that needs to build CUDA extensions.
|
||||
|
||||
Therefore, if you encounter a CUDA-related build issue while doing one of the following or both:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install fairscale
|
||||
pip install deepspeed
|
||||
|
||||
please, read the following notes first.
|
||||
|
||||
In these notes we give examples for what to do when ``pytorch`` has been built with CUDA ``10.2``. If your situation is
|
||||
different remember to adjust the version number to the one you are after.
|
||||
|
||||
Possible problem #1
|
||||
=======================================================================================================================
|
||||
|
||||
While, Pytorch comes with its own CUDA toolkit, to build these two projects you must have an identical version of CUDA
|
||||
installed system-wide.
|
||||
|
||||
For example, if you installed ``pytorch`` with ``cudatoolkit==10.2`` in the Python environment, you also need to have
|
||||
CUDA ``10.2`` installed system-wide.
|
||||
|
||||
The exact location may vary from system to system, but ``/usr/local/cuda-10.2`` is the most common location on many
|
||||
Unix systems. When CUDA is correctly set up and added to the ``PATH`` environment variable, one can find the
|
||||
installation location by doing:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
which nvcc
|
||||
|
||||
If you don't have CUDA installed system-wide, install it first. You will find the instructions by using your favorite
|
||||
search engine. For example, if you're on Ubuntu you may want to search for: `ubuntu cuda 10.2 install
|
||||
<https://www.google.com/search?q=ubuntu+cuda+10.2+install>`__.
|
||||
|
||||
Possible problem #2
|
||||
=======================================================================================================================
|
||||
|
||||
Another possible common problem is that you may have more than one CUDA toolkit installed system-wide. For example you
|
||||
may have:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
/usr/local/cuda-10.2
|
||||
/usr/local/cuda-11.0
|
||||
|
||||
Now, in this situation you need to make sure that your ``PATH`` and ``LD_LIBRARY_PATH`` environment variables contain
|
||||
the correct paths to the desired CUDA version. Typically, package installers will set these to contain whatever the
|
||||
last version was installed. If you encounter the problem, where the package build fails because it can't find the right
|
||||
CUDA version despite you having it installed system-wide, it means that you need to adjust the 2 aforementioned
|
||||
environment variables.
|
||||
|
||||
First, you may look at their contents:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
echo $PATH
|
||||
echo $LD_LIBRARY_PATH
|
||||
|
||||
so you get an idea of what is inside.
|
||||
|
||||
It's possible that ``LD_LIBRARY_PATH`` is empty.
|
||||
|
||||
``PATH`` lists the locations of where executables can be found and ``LD_LIBRARY_PATH`` is for where shared libraries
|
||||
are to looked for. In both cases, earlier entries have priority over the later ones. ``:`` is used to separate multiple
|
||||
entries.
|
||||
|
||||
Now, to tell the build program where to find the specific CUDA toolkit, insert the desired paths to be listed first by
|
||||
doing:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export PATH=/usr/local/cuda-10.2/bin:$PATH
|
||||
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
|
||||
|
||||
Note that we aren't overwriting the existing values, but prepending instead.
|
||||
|
||||
Of course, adjust the version number, the full path if need be. Check that the directories you assign actually do
|
||||
exist. ``lib64`` sub-directory is where the various CUDA ``.so`` objects, like ``libcudart.so`` reside, it's unlikely
|
||||
that your system will have it named differently, but if it is adjust it to reflect your reality.
|
||||
|
||||
|
||||
Possible problem #3
|
||||
=======================================================================================================================
|
||||
|
||||
Some older CUDA versions may refuse to build with newer compilers. For example, you my have ``gcc-9`` but it wants
|
||||
``gcc-7``.
|
||||
|
||||
There are various ways to go about it.
|
||||
|
||||
If you can install the latest CUDA toolkit it typically should support the newer compiler.
|
||||
|
||||
Alternatively, you could install the lower version of the compiler in addition to the one you already have, or you may
|
||||
already have it but it's not the default one, so the build system can't see it. If you have ``gcc-7`` installed but the
|
||||
build system complains it can't find it, the following might do the trick:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
sudo ln -s /usr/bin/gcc-7 /usr/local/cuda-10.2/bin/gcc
|
||||
sudo ln -s /usr/bin/g++-7 /usr/local/cuda-10.2/bin/g++
|
||||
|
||||
|
||||
Here, we are making a symlink to ``gcc-7`` from ``/usr/local/cuda-10.2/bin/gcc`` and since
|
||||
``/usr/local/cuda-10.2/bin/`` should be in the ``PATH`` environment variable (see the previous problem's solution), it
|
||||
should find ``gcc-7`` (and ``g++7``) and then the build will succeed.
|
||||
|
||||
As always make sure to edit the paths in the example to match your situation.
|
||||
|
||||
FairScale
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By integrating `FairScale <https://github.com/facebookresearch/fairscale/>`__ the :class:`~transformers.Trainer`
|
||||
provides support for the following features from `the ZeRO paper <https://arxiv.org/abs/1910.02054>`__:
|
||||
|
||||
1. Optimizer State Sharding
|
||||
2. Gradient Sharding
|
||||
3. Model Parameters Sharding (new and very experimental)
|
||||
4. CPU offload (new and very experimental)
|
||||
|
||||
You will need at least two GPUs to use this feature.
|
||||
|
||||
|
||||
**Installation**:
|
||||
|
||||
Install the library via pypi:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install fairscale
|
||||
|
||||
or via ``transformers``' ``extras``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install transformers[fairscale]
|
||||
|
||||
(will become available starting from ``transformers==4.6.0``)
|
||||
|
||||
or find more details on `the FairScale's GitHub page <https://github.com/facebookresearch/fairscale/#installation>`__.
|
||||
|
||||
If you're still struggling with the build, first make sure to read :ref:`zero-install-notes`.
|
||||
|
||||
If it's still not resolved the build issue, here are a few more ideas.
|
||||
|
||||
``fairscale`` seems to have an issue with the recently introduced by pip build isolation feature. If you have a problem
|
||||
with it, you may want to try one of:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install fairscale --no-build-isolation .
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/facebookresearch/fairscale/
|
||||
cd fairscale
|
||||
rm -r dist build
|
||||
python setup.py bdist_wheel
|
||||
pip uninstall -y fairscale
|
||||
pip install dist/fairscale-*.whl
|
||||
|
||||
``fairscale`` also has issues with building against pytorch-nightly, so if you use it you may have to try one of:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip uninstall -y fairscale; pip install fairscale --pre \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html \
|
||||
--no-cache --no-build-isolation
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install -v --disable-pip-version-check . \
|
||||
-f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html --pre
|
||||
|
||||
Of course, adjust the urls to match the cuda version you use.
|
||||
|
||||
If after trying everything suggested you still encounter build issues, please, proceed with the GitHub Issue of
|
||||
`FairScale <https://github.com/facebookresearch/fairscale/issues>`__.
|
||||
|
||||
|
||||
|
||||
**Usage**:
|
||||
|
||||
To use the first version of Sharded data-parallelism, add ``--sharded_ddp simple`` to the command line arguments, and
|
||||
make sure you have added the distributed launcher ``-m torch.distributed.launch
|
||||
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp simple
|
||||
|
||||
Notes:
|
||||
|
||||
- This feature requires distributed training (so multiple GPUs).
|
||||
- It is not implemented for TPUs.
|
||||
- It works with ``--fp16`` too, to make things even faster.
|
||||
- One of the main benefits of enabling ``--sharded_ddp simple`` is that it uses a lot less GPU memory, so you should be
|
||||
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
|
||||
significantly shorter training time.
|
||||
|
||||
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp
|
||||
zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m
|
||||
torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
|
||||
|
||||
For example here is how you could use it for ``run_translation.py`` with 2 GPUs:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m torch.distributed.launch --nproc_per_node=2 examples/pytorch/translation/run_translation.py \
|
||||
--model_name_or_path t5-small --per_device_train_batch_size 1 \
|
||||
--output_dir output_dir --overwrite_output_dir \
|
||||
--do_train --max_train_samples 500 --num_train_epochs 1 \
|
||||
--dataset_name wmt16 --dataset_config "ro-en" \
|
||||
--source_lang en --target_lang ro \
|
||||
--fp16 --sharded_ddp zero_dp_2
|
||||
|
||||
:obj:`zero_dp_2` is an optimized version of the simple wrapper, while :obj:`zero_dp_3` fully shards model weights,
|
||||
gradients and optimizer states.
|
||||
|
||||
Both are compatible with adding :obj:`cpu_offload` to enable ZeRO-offload (activate it like this: :obj:`--sharded_ddp
|
||||
"zero_dp_2 cpu_offload"`).
|
||||
|
||||
Notes:
|
||||
|
||||
- This feature requires distributed training (so multiple GPUs).
|
||||
- It is not implemented for TPUs.
|
||||
- It works with ``--fp16`` too, to make things even faster.
|
||||
- The ``cpu_offload`` additional option requires ``--fp16``.
|
||||
- This is an area of active development, so make sure you have a source install of fairscale to use this feature as
|
||||
some bugs you encounter may have been fixed there already.
|
||||
|
||||
Known caveats:
|
||||
|
||||
- This feature is incompatible with :obj:`--predict_with_generate` in the `run_translation.py` script.
|
||||
- Using :obj:`--sharded_ddp zero_dp_3` requires wrapping each layer of the model in the special container
|
||||
:obj:`FullyShardedDataParallelism` of fairscale. It should be used with the option :obj:`auto_wrap` if you are not
|
||||
doing this yourself: :obj:`--sharded_ddp "zero_dp_3 auto_wrap"`.
|
||||
|
||||
|
||||
DeepSpeed
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
||||
Moved to :ref:`deepspeed-trainer-integration`.
|
||||
|
||||
|
||||
Installation
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-installation`.
|
||||
|
||||
|
||||
Deployment with multiple GPUs
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-multi-gpu`.
|
||||
|
||||
|
||||
Deployment with one GPU
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-one-gpu`.
|
||||
|
||||
|
||||
Deployment in Notebooks
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-notebook`.
|
||||
|
||||
|
||||
Configuration
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-config`.
|
||||
|
||||
|
||||
Passing Configuration
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-config-passing`.
|
||||
|
||||
|
||||
Shared Configuration
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-config-shared`.
|
||||
|
||||
ZeRO
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-zero`.
|
||||
|
||||
ZeRO-2 Config
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-zero2-config`.
|
||||
|
||||
ZeRO-3 Config
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-zero3-config`.
|
||||
|
||||
|
||||
NVMe Support
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-nvme`.
|
||||
|
||||
ZeRO-2 vs ZeRO-3 Performance
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-zero2-zero3-performance`.
|
||||
|
||||
ZeRO-2 Example
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-zero2-example`.
|
||||
|
||||
ZeRO-3 Example
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-zero3-example`.
|
||||
|
||||
Optimizer and Scheduler
|
||||
=======================================================================================================================
|
||||
|
||||
|
||||
|
||||
Optimizer
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-optimizer`.
|
||||
|
||||
|
||||
Scheduler
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
Moved to :ref:`deepspeed-scheduler`.
|
||||
|
||||
fp32 Precision
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-fp32`.
|
||||
|
||||
Automatic Mixed Precision
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-amp`.
|
||||
|
||||
Batch Size
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-bs`.
|
||||
|
||||
Gradient Accumulation
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-grad-acc`.
|
||||
|
||||
|
||||
Gradient Clipping
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-grad-clip`.
|
||||
|
||||
|
||||
Getting The Model Weights Out
|
||||
=======================================================================================================================
|
||||
|
||||
Moved to :ref:`deepspeed-weight-extraction`.
|
||||
@ -31,7 +31,7 @@ This introduces two breaking changes:
|
||||
|
||||
##### How to obtain the same behavior as v3.x in v4.x
|
||||
|
||||
- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
|
||||
- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](main_classes/pipelines#transformers.TokenClassificationPipeline).
|
||||
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:
|
||||
|
||||
In version `v3.x`:
|
||||
@ -98,7 +98,7 @@ from transformers.models.bert.modeling_bert import BertLayer
|
||||
|
||||
#### 4. Switching the `return_dict` argument to `True` by default
|
||||
|
||||
The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
|
||||
The [`return_dict` argument](main_classes/output) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
|
||||
|
||||
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.
|
||||
|
||||
|
||||
@ -43,7 +43,8 @@ Tips:
|
||||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||
number of (repeating) layers.
|
||||
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. This model jax version was contributed by
|
||||
`kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/google-research/ALBERT>`__.
|
||||
|
||||
AlbertConfig
|
||||
@ -174,3 +175,52 @@ TFAlbertForQuestionAnswering
|
||||
|
||||
.. autoclass:: transformers.TFAlbertForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxAlbertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForPreTraining
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxAlbertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAlbertForQuestionAnswering
|
||||
:members: __call__
|
||||
|
||||
@ -27,7 +27,32 @@ Instantiating one of :class:`~transformers.AutoConfig`, :class:`~transformers.Au
|
||||
|
||||
will create a model that is an instance of :class:`~transformers.BertModel`.
|
||||
|
||||
There is one class of :obj:`AutoModel` for each task, and for each backend (PyTorch or TensorFlow).
|
||||
There is one class of :obj:`AutoModel` for each task, and for each backend (PyTorch, TensorFlow, or Flax).
|
||||
|
||||
Extending the Auto Classes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Each of the auto classes has a method to be extended with your custom classes. For instance, if you have defined a
|
||||
custom class of model :obj:`NewModel`, make sure you have a :obj:`NewModelConfig` then you can add those to the auto
|
||||
classes like this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import AutoConfig, AutoModel
|
||||
|
||||
AutoConfig.register("new-model", NewModelConfig)
|
||||
AutoModel.register(NewModelConfig, NewModel)
|
||||
|
||||
You will then be able to use the auto classes like you would usually do!
|
||||
|
||||
.. warning::
|
||||
|
||||
If your :obj:`NewModelConfig` is a subclass of :class:`~transformer.PretrainedConfig`, make sure its
|
||||
:obj:`model_type` attribute is set to the same key you use when registering the config (here :obj:`"new-model"`).
|
||||
|
||||
Likewise, if your :obj:`NewModel` is a subclass of :class:`~transformers.PreTrainedModel`, make sure its
|
||||
:obj:`config_class` attribute is set to the same class you use when registering the model (here
|
||||
:obj:`NewModelConfig`).
|
||||
|
||||
|
||||
AutoConfig
|
||||
@ -51,6 +76,13 @@ AutoFeatureExtractor
|
||||
:members:
|
||||
|
||||
|
||||
AutoProcessor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoProcessor
|
||||
:members:
|
||||
|
||||
|
||||
AutoModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -135,6 +167,48 @@ AutoModelForImageClassification
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForVision2Seq
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForVision2Seq
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForAudioClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForAudioClassification
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForCTC
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForCTC
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForSpeechSeq2Seq
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForSpeechSeq2Seq
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForObjectDetection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForObjectDetection
|
||||
:members:
|
||||
|
||||
|
||||
AutoModelForImageSegmentation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.AutoModelForImageSegmentation
|
||||
:members:
|
||||
|
||||
|
||||
TFAutoModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -156,6 +230,13 @@ TFAutoModelForCausalLM
|
||||
:members:
|
||||
|
||||
|
||||
TFAutoModelForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAutoModelForImageClassification
|
||||
:members:
|
||||
|
||||
|
||||
TFAutoModelForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -184,6 +265,13 @@ TFAutoModelForMultipleChoice
|
||||
:members:
|
||||
|
||||
|
||||
TFAutoModelForTableQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFAutoModelForTableQuestionAnswering
|
||||
:members:
|
||||
|
||||
|
||||
TFAutoModelForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -273,3 +361,10 @@ FlaxAutoModelForImageClassification
|
||||
|
||||
.. autoclass:: transformers.FlaxAutoModelForImageClassification
|
||||
:members:
|
||||
|
||||
|
||||
FlaxAutoModelForVision2Seq
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxAutoModelForVision2Seq
|
||||
:members:
|
||||
|
||||
@ -74,7 +74,7 @@ The :obj:`facebook/bart-base` and :obj:`facebook/bart-large` checkpoints can be
|
||||
.. code-block::
|
||||
|
||||
from transformers import BartForConditionalGeneration, BartTokenizer
|
||||
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", force_bos_token_to_be_generated=True)
|
||||
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
|
||||
tok = BartTokenizer.from_pretrained("facebook/bart-large")
|
||||
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
|
||||
batch = tok(example_english_phrase, return_tensors='pt')
|
||||
|
||||
86
docs/source/model_doc/bartpho.rst
Normal file
86
docs/source/model_doc/bartpho.rst
Normal file
@ -0,0 +1,86 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BARTpho
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BARTpho model was proposed in `BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
|
||||
<https://arxiv.org/abs/2109.09701>`__ by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present BARTpho with two versions -- BARTpho_word and BARTpho_syllable -- the first public large-scale monolingual
|
||||
sequence-to-sequence models pre-trained for Vietnamese. Our BARTpho uses the "large" architecture and pre-training
|
||||
scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments
|
||||
on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, our BARTpho
|
||||
outperforms the strong baseline mBART and improves the state-of-the-art. We release BARTpho to facilitate future
|
||||
research and applications of generative Vietnamese NLP tasks.*
|
||||
|
||||
Example of use:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import torch
|
||||
>>> from transformers import AutoModel, AutoTokenizer
|
||||
|
||||
>>> bartpho = AutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bartpho-syllable")
|
||||
|
||||
>>> line = "Chúng tôi là những nghiên cứu viên."
|
||||
|
||||
>>> input_ids = tokenizer(line, return_tensors="pt")
|
||||
|
||||
>>> with torch.no_grad():
|
||||
... features = bartpho(**input_ids) # Models outputs are now tuples
|
||||
|
||||
>>> # With TensorFlow 2.0+:
|
||||
>>> from transformers import TFAutoModel
|
||||
>>> bartpho = TFAutoModel.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> input_ids = tokenizer(line, return_tensors="tf")
|
||||
>>> features = bartpho(**input_ids)
|
||||
|
||||
Tips:
|
||||
|
||||
- Following mBART, BARTpho uses the "large" architecture of BART with an additional layer-normalization layer on top of
|
||||
both the encoder and decoder. Thus, usage examples in the :doc:`documentation of BART <bart>`, when adapting to use
|
||||
with BARTpho, should be adjusted by replacing the BART-specialized classes with the mBART-specialized counterparts.
|
||||
For example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import MBartForConditionalGeneration
|
||||
>>> bartpho = MBartForConditionalGeneration.from_pretrained("vinai/bartpho-syllable")
|
||||
>>> TXT = 'Chúng tôi là <mask> nghiên cứu viên.'
|
||||
>>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
|
||||
>>> logits = bartpho(input_ids).logits
|
||||
>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
|
||||
>>> probs = logits[0, masked_index].softmax(dim=0)
|
||||
>>> values, predictions = probs.topk(5)
|
||||
>>> print(tokenizer.decode(predictions).split())
|
||||
|
||||
- This implementation is only for tokenization: "monolingual_vocab_file" consists of Vietnamese-specialized types
|
||||
extracted from the pre-trained SentencePiece model "vocab_file" that is available from the multilingual XLM-RoBERTa.
|
||||
Other languages, if employing this pre-trained multilingual SentencePiece model "vocab_file" for subword
|
||||
segmentation, can reuse BartphoTokenizer with their own language-specialized "monolingual_vocab_file".
|
||||
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||
<https://github.com/VinAIResearch/BARTpho>`__.
|
||||
|
||||
BartphoTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BartphoTokenizer
|
||||
:members:
|
||||
144
docs/source/model_doc/beit.rst
Normal file
144
docs/source/model_doc/beit.rst
Normal file
@ -0,0 +1,144 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
BEiT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The BEiT model was proposed in `BEiT: BERT Pre-Training of Image Transformers <https://arxiv.org/abs/2106.08254>`__ by
|
||||
Hangbo Bao, Li Dong and Furu Wei. Inspired by BERT, BEiT is the first paper that makes self-supervised pre-training of
|
||||
Vision Transformers (ViTs) outperform supervised pre-training. Rather than pre-training the model to predict the class
|
||||
of an image (as done in the `original ViT paper <https://arxiv.org/abs/2010.11929>`__), BEiT models are pre-trained to
|
||||
predict visual tokens from the codebook of OpenAI's `DALL-E model <https://arxiv.org/abs/2102.12092>`__ given masked
|
||||
patches.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation
|
||||
from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image
|
||||
modeling task to pretrain vision Transformers. Specifically, each image has two views in our pre-training, i.e, image
|
||||
patches (such as 16x16 pixels), and visual tokens (i.e., discrete tokens). We first "tokenize" the original image into
|
||||
visual tokens. Then we randomly mask some image patches and fed them into the backbone Transformer. The pre-training
|
||||
objective is to recover the original visual tokens based on the corrupted image patches. After pre-training BEiT, we
|
||||
directly fine-tune the model parameters on downstream tasks by appending task layers upon the pretrained encoder.
|
||||
Experimental results on image classification and semantic segmentation show that our model achieves competitive results
|
||||
with previous pre-training methods. For example, base-size BEiT achieves 83.2% top-1 accuracy on ImageNet-1K,
|
||||
significantly outperforming from-scratch DeiT training (81.8%) with the same setup. Moreover, large-size BEiT obtains
|
||||
86.3% only using ImageNet-1K, even outperforming ViT-L with supervised pre-training on ImageNet-22K (85.2%).*
|
||||
|
||||
Tips:
|
||||
|
||||
- BEiT models are regular Vision Transformers, but pre-trained in a self-supervised way rather than supervised. They
|
||||
outperform both the :doc:`original model (ViT) <vit>` as well as :doc:`Data-efficient Image Transformers (DeiT)
|
||||
<deit>` when fine-tuned on ImageNet-1K and CIFAR-100. You can check out demo notebooks regarding inference as well as
|
||||
fine-tuning on custom data `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer>`__ (you can just replace
|
||||
:class:`~transformers.ViTFeatureExtractor` by :class:`~transformers.BeitFeatureExtractor` and
|
||||
:class:`~transformers.ViTForImageClassification` by :class:`~transformers.BeitForImageClassification`).
|
||||
- There's also a demo notebook available which showcases how to combine DALL-E's image tokenizer with BEiT for
|
||||
performing masked image modeling. You can find it `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/BEiT>`__.
|
||||
- As the BEiT models expect each image to be of the same size (resolution), one can use
|
||||
:class:`~transformers.BeitFeatureExtractor` to resize (or rescale) and normalize images for the model.
|
||||
- Both the patch resolution and image resolution used during pre-training or fine-tuning are reflected in the name of
|
||||
each checkpoint. For example, :obj:`microsoft/beit-base-patch16-224` refers to a base-sized architecture with patch
|
||||
resolution of 16x16 and fine-tuning resolution of 224x224. All checkpoints can be found on the `hub
|
||||
<https://huggingface.co/models?search=microsoft/beit>`__.
|
||||
- The available checkpoints are either (1) pre-trained on `ImageNet-22k <http://www.image-net.org/>`__ (a collection of
|
||||
14 million images and 22k classes) only, (2) also fine-tuned on ImageNet-22k or (3) also fine-tuned on `ImageNet-1k
|
||||
<http://www.image-net.org/challenges/LSVRC/2012/>`__ (also referred to as ILSVRC 2012, a collection of 1.3 million
|
||||
images and 1,000 classes).
|
||||
- BEiT uses relative position embeddings, inspired by the T5 model. During pre-training, the authors shared the
|
||||
relative position bias among the several self-attention layers. During fine-tuning, each layer's relative position
|
||||
bias is initialized with the shared relative position bias obtained after pre-training. Note that, if one wants to
|
||||
pre-train a model from scratch, one needs to either set the :obj:`use_relative_position_bias` or the
|
||||
:obj:`use_relative_position_bias` attribute of :class:`~transformers.BeitConfig` to :obj:`True` in order to add
|
||||
position embeddings.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The JAX/FLAX version of this model was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/unilm/tree/master/beit>`__.
|
||||
|
||||
|
||||
BEiT specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.beit.modeling_beit.BeitModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.beit.modeling_flax_beit.FlaxBeitModelOutputWithPooling
|
||||
:members:
|
||||
|
||||
|
||||
BeitConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitConfig
|
||||
:members:
|
||||
|
||||
|
||||
BeitFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitFeatureExtractor
|
||||
:members: __call__
|
||||
|
||||
|
||||
BeitModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitModel
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForMaskedImageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForMaskedImageModeling
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForImageClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
BeitForSemanticSegmentation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BeitForSemanticSegmentation
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxBeitModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBeitForMaskedImageModeling
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitForMaskedImageModeling
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxBeitForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBeitForImageClassification
|
||||
:members: __call__
|
||||
@ -76,6 +76,9 @@ Bert specific outputs
|
||||
.. autoclass:: transformers.models.bert.modeling_tf_bert.TFBertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.bert.modeling_flax_bert.FlaxBertForPreTrainingOutput
|
||||
:members:
|
||||
|
||||
|
||||
BertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -10,7 +10,7 @@
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
Bertweet
|
||||
BERTweet
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
|
||||
@ -47,7 +47,7 @@ Implementation Notes
|
||||
- Available checkpoints can be found in the `model hub <https://huggingface.co/models?search=blenderbot>`__.
|
||||
- This is the `default` Blenderbot model class. However, some smaller checkpoints, such as
|
||||
``facebook/blenderbot_small_90M``, have a different architecture and consequently should be used with
|
||||
`BlenderbotSmall <https://huggingface.co/transformers/master/model_doc/blenderbot_small.html>`__.
|
||||
`BlenderbotSmall <blenderbot_small>`__.
|
||||
|
||||
|
||||
Usage
|
||||
@ -81,6 +81,13 @@ BlenderbotTokenizer
|
||||
:members: build_inputs_with_special_tokens
|
||||
|
||||
|
||||
BlenderbotTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotTokenizerFast
|
||||
:members: build_inputs_with_special_tokens
|
||||
|
||||
|
||||
BlenderbotModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -118,3 +125,17 @@ TFBlenderbotForConditionalGeneration
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxBlenderbotModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
||||
|
||||
@ -57,6 +57,13 @@ BlenderbotSmallTokenizer
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
BlenderbotSmallTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BlenderbotSmallTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
BlenderbotSmallModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -90,3 +97,17 @@ TFBlenderbotSmallForConditionalGeneration
|
||||
|
||||
.. autoclass:: transformers.TFBlenderbotSmallForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxBlenderbotSmallModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotSmallModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxBlenderbotForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxBlenderbotSmallForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
||||
|
||||
@ -39,8 +39,11 @@ experiments.*
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://github.com/google-research/byt5>`__.
|
||||
|
||||
ByT5's architecture is based on the T5v1.1 model, so one can refer to :doc:`T5v1.1's documentation page <t5v1.1>`. They
|
||||
only differ in how inputs should be prepared for the model, see the code examples below.
|
||||
|
||||
ByT5's architecture is based on the T5 model, so one can refer to :doc:`T5's documentation page <t5>`.
|
||||
Since ByT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
|
||||
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
|
||||
|
||||
|
||||
Example
|
||||
|
||||
@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
|
||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||
|
||||
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. This model TF 2.0 implementation was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__ . The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
@ -103,3 +104,45 @@ DebertaForQuestionAnswering
|
||||
|
||||
.. autoclass:: transformers.DebertaForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFDebertaModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaPreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaPreTrainedModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
@ -53,12 +53,13 @@ New in v2:
|
||||
transformer layer to better learn the local dependency of input tokens.
|
||||
- **Sharing position projection matrix with content projection matrix in attention layer** Based on previous
|
||||
experiments, this can save parameters without affecting the performance.
|
||||
- **Apply bucket to encode relative postions** The DeBERTa-v2 model uses log bucket to encode relative positions
|
||||
- **Apply bucket to encode relative positions** The DeBERTa-v2 model uses log bucket to encode relative positions
|
||||
similar to T5.
|
||||
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
||||
performance of downstream tasks.
|
||||
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. This model TF 2.0 implementation was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/DeBERTa>`__.
|
||||
|
||||
|
||||
@ -117,3 +118,45 @@ DebertaV2ForQuestionAnswering
|
||||
|
||||
.. autoclass:: transformers.DebertaV2ForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFDebertaV2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2Model
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2PreTrainedModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2PreTrainedModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFDebertaV2ForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFDebertaV2ForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
@ -25,12 +25,12 @@ Overview
|
||||
|
||||
The DeiT model was proposed in `Training data-efficient image transformers & distillation through attention
|
||||
<https://arxiv.org/abs/2012.12877>`__ by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre
|
||||
Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <https://huggingface.co/transformers/model_doc/vit.html>`__
|
||||
introduced in `Dosovitskiy et al., 2020 <https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even
|
||||
outperform existing convolutional neural networks using a Transformer encoder (BERT-like). However, the ViT models
|
||||
introduced in that paper required training on expensive infrastructure for multiple weeks, using external data. DeiT
|
||||
(data-efficient image transformers) are more efficiently trained transformers for image classification, requiring far
|
||||
less data and far less computing resources compared to the original ViT models.
|
||||
Sablayrolles, Hervé Jégou. The `Vision Transformer (ViT) <vit>`__ introduced in `Dosovitskiy et al., 2020
|
||||
<https://arxiv.org/abs/2010.11929>`__ has shown that one can match or even outperform existing convolutional neural
|
||||
networks using a Transformer encoder (BERT-like). However, the ViT models introduced in that paper required training on
|
||||
expensive infrastructure for multiple weeks, using external data. DeiT (data-efficient image transformers) are more
|
||||
efficiently trained transformers for image classification, requiring far less data and far less computing resources
|
||||
compared to the original ViT models.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
|
||||
169
docs/source/model_doc/detr.mdx
Normal file
169
docs/source/model_doc/detr.mdx
Normal file
@ -0,0 +1,169 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# DETR
|
||||
|
||||
## Overview
|
||||
|
||||
The DETR model was proposed in [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by
|
||||
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and Sergey Zagoruyko. DETR
|
||||
consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for
|
||||
object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use
|
||||
things like region proposals, non-maximum suppression procedure and anchor generation. Moreover, DETR can also be
|
||||
naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the
|
||||
detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression
|
||||
procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the
|
||||
new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via
|
||||
bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries,
|
||||
DETR reasons about the relations of the objects and the global image context to directly output the final set of
|
||||
predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many
|
||||
other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and
|
||||
highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily
|
||||
generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive
|
||||
baselines.*
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/facebookresearch/detr).
|
||||
|
||||
The quickest way to get started with DETR is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR) (which showcase both inference and
|
||||
fine-tuning on custom data).
|
||||
|
||||
Here's a TLDR explaining how [`~transformers.DetrForObjectDetection`] works:
|
||||
|
||||
First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use
|
||||
ResNet-50/ResNet-101). Let's assume we also add a batch dimension. This means that the input to the backbone is a
|
||||
tensor of shape `(batch_size, 3, height, width)`, assuming the image has 3 color channels (RGB). The CNN backbone
|
||||
outputs a new lower-resolution feature map, typically of shape `(batch_size, 2048, height/32, width/32)`. This is
|
||||
then projected to match the hidden dimension of the Transformer of DETR, which is `256` by default, using a
|
||||
`nn.Conv2D` layer. So now, we have a tensor of shape `(batch_size, 256, height/32, width/32).` Next, the
|
||||
feature map is flattened and transposed to obtain a tensor of shape `(batch_size, seq_len, d_model)` =
|
||||
`(batch_size, width/32*height/32, 256)`. So a difference with NLP models is that the sequence length is actually
|
||||
longer than usual, but with a smaller `d_model` (which in NLP is typically 768 or higher).
|
||||
|
||||
Next, this is sent through the encoder, outputting `encoder_hidden_states` of the same shape (you can consider
|
||||
these as image features). Next, so-called **object queries** are sent through the decoder. This is a tensor of shape
|
||||
`(batch_size, num_queries, d_model)`, with `num_queries` typically set to 100 and initialized with zeros.
|
||||
These input embeddings are learnt positional encodings that the authors refer to as object queries, and similarly to
|
||||
the encoder, they are added to the input of each attention layer. Each object query will look for a particular object
|
||||
in the image. The decoder updates these embeddings through multiple self-attention and encoder-decoder attention layers
|
||||
to output `decoder_hidden_states` of the same shape: `(batch_size, num_queries, d_model)`. Next, two heads
|
||||
are added on top for object detection: a linear layer for classifying each object query into one of the objects or "no
|
||||
object", and a MLP to predict bounding boxes for each query.
|
||||
|
||||
The model is trained using a **bipartite matching loss**: so what we actually do is compare the predicted classes +
|
||||
bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N
|
||||
(so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as
|
||||
bounding box). The [Hungarian matching algorithm](https://en.wikipedia.org/wiki/Hungarian_algorithm) is used to find
|
||||
an optimal one-to-one mapping of each of the N queries to each of the N annotations. Next, standard cross-entropy (for
|
||||
the classes) and a linear combination of the L1 and [generalized IoU loss](https://giou.stanford.edu/) (for the
|
||||
bounding boxes) are used to optimize the parameters of the model.
|
||||
|
||||
DETR can be naturally extended to perform panoptic segmentation (which unifies semantic segmentation and instance
|
||||
segmentation). [`~transformers.DetrForSegmentation`] adds a segmentation mask head on top of
|
||||
[`~transformers.DetrForObjectDetection`]. The mask head can be trained either jointly, or in a two steps process,
|
||||
where one first trains a [`~transformers.DetrForObjectDetection`] model to detect bounding boxes around both
|
||||
"things" (instances) and "stuff" (background things like trees, roads, sky), then freeze all the weights and train only
|
||||
the mask head for 25 epochs. Experimentally, these two approaches give similar results. Note that predicting boxes is
|
||||
required for the training to be possible, since the Hungarian matching is computed using distances between boxes.
|
||||
|
||||
Tips:
|
||||
|
||||
- DETR uses so-called **object queries** to detect objects in an image. The number of queries determines the maximum
|
||||
number of objects that can be detected in a single image, and is set to 100 by default (see parameter
|
||||
`num_queries` of [`~transformers.DetrConfig`]). Note that it's good to have some slack (in COCO, the
|
||||
authors used 100, while the maximum number of objects in a COCO image is ~70).
|
||||
- The decoder of DETR updates the query embeddings in parallel. This is different from language models like GPT-2,
|
||||
which use autoregressive decoding instead of parallel. Hence, no causal attention mask is used.
|
||||
- DETR adds position embeddings to the hidden states at each self-attention and cross-attention layer before projecting
|
||||
to queries and keys. For the position embeddings of the image, one can choose between fixed sinusoidal or learned
|
||||
absolute position embeddings. By default, the parameter `position_embedding_type` of
|
||||
[`~transformers.DetrConfig`] is set to `"sine"`.
|
||||
- During training, the authors of DETR did find it helpful to use auxiliary losses in the decoder, especially to help
|
||||
the model output the correct number of objects of each class. If you set the parameter `auxiliary_loss` of
|
||||
[`~transformers.DetrConfig`] to `True`, then prediction feedforward neural networks and Hungarian losses
|
||||
are added after each decoder layer (with the FFNs sharing parameters).
|
||||
- If you want to train the model in a distributed environment across multiple nodes, then one should update the
|
||||
_num_boxes_ variable in the _DetrLoss_ class of _modeling_detr.py_. When training on multiple nodes, this should be
|
||||
set to the average number of target boxes across all nodes, as can be seen in the original implementation [here](https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/detr.py#L227-L232).
|
||||
- [`~transformers.DetrForObjectDetection`] and [`~transformers.DetrForSegmentation`] can be initialized with
|
||||
any convolutional backbone available in the [timm library](https://github.com/rwightman/pytorch-image-models).
|
||||
Initializing with a MobileNet backbone for example can be done by setting the `backbone` attribute of
|
||||
[`~transformers.DetrConfig`] to `"tf_mobilenetv3_small_075"`, and then initializing the model with that
|
||||
config.
|
||||
- DETR resizes the input images such that the shortest side is at least a certain amount of pixels while the longest is
|
||||
at most 1333 pixels. At training time, scale augmentation is used such that the shortest side is randomly set to at
|
||||
least 480 and at most 800 pixels. At inference time, the shortest side is set to 800. One can use
|
||||
[`~transformers.DetrFeatureExtractor`] to prepare images (and optional annotations in COCO format) for the
|
||||
model. Due to this resizing, images in a batch can have different sizes. DETR solves this by padding images up to the
|
||||
largest size in a batch, and by creating a pixel mask that indicates which pixels are real/which are padding.
|
||||
Alternatively, one can also define a custom `collate_fn` in order to batch images together, using
|
||||
[`~transformers.DetrFeatureExtractor.pad_and_create_pixel_mask`].
|
||||
- The size of the images will determine the amount of memory being used, and will thus determine the `batch_size`.
|
||||
It is advised to use a batch size of 2 per GPU. See [this Github thread](https://github.com/facebookresearch/detr/issues/150) for more info.
|
||||
|
||||
As a summary, consider the following table:
|
||||
|
||||
| Task | Object detection | Instance segmentation | Panoptic segmentation |
|
||||
|------|------------------|-----------------------|-----------------------|
|
||||
| **Description** | Predicting bounding boxes and class labels around objects in an image | Predicting masks around objects (i.e. instances) in an image | Predicting masks around both objects (i.e. instances) as well as "stuff" (i.e. background things like trees and roads) in an image |
|
||||
| **Model** | [`~transformers.DetrForObjectDetection`] | [`~transformers.DetrForSegmentation`] | [`~transformers.DetrForSegmentation`] |
|
||||
| **Example dataset** | COCO detection | COCO detection, COCO panoptic | COCO panoptic | |
|
||||
| **Format of annotations to provide to** [`~transformers.DetrFeatureExtractor`] | {'image_id': `int`, 'annotations': `List[Dict]`} each Dict being a COCO object annotation | {'image_id': `int`, 'annotations': `List[Dict]`} (in case of COCO detection) or {'file_name': `str`, 'image_id': `int`, 'segments_info': `List[Dict]`} (in case of COCO panoptic) | {'file_name': `str`, 'image_id': `int`, 'segments_info': `List[Dict]`} and masks_path (path to directory containing PNG files of the masks) |
|
||||
| **Postprocessing** (i.e. converting the output of the model to COCO API) | [`~transformers.DetrFeatureExtractor.post_process`] | [`~transformers.DetrFeatureExtractor.post_process_segmentation`] | [`~transformers.DetrFeatureExtractor.post_process_segmentation`], [`~transformers.DetrFeatureExtractor.post_process_panoptic`] |
|
||||
| **evaluators** | `CocoEvaluator` with `iou_types="bbox"` | `CocoEvaluator` with `iou_types="bbox"` or `"segm"` | `CocoEvaluator` with `iou_tupes="bbox"` or `"segm"`, `PanopticEvaluator` |
|
||||
|
||||
In short, one should prepare the data either in COCO detection or COCO panoptic format, then use
|
||||
[`~transformers.DetrFeatureExtractor`] to create `pixel_values`, `pixel_mask` and optional
|
||||
`labels`, which can then be used to train (or fine-tune) a model. For evaluation, one should first convert the
|
||||
outputs of the model using one of the postprocessing methods of [`~transformers.DetrFeatureExtractor`]. These can
|
||||
be be provided to either `CocoEvaluator` or `PanopticEvaluator`, which allow you to calculate metrics like
|
||||
mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are implemented in the [original repository](https://github.com/facebookresearch/detr). See the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR) for more info regarding evaluation.
|
||||
|
||||
|
||||
## DETR specific outputs
|
||||
|
||||
[[autodoc]] models.detr.modeling_detr.DetrModelOutput
|
||||
|
||||
[[autodoc]] models.detr.modeling_detr.DetrObjectDetectionOutput
|
||||
|
||||
[[autodoc]] models.detr.modeling_detr.DetrSegmentationOutput
|
||||
|
||||
## DetrConfig
|
||||
|
||||
[[autodoc]] DetrConfig
|
||||
|
||||
## DetrFeatureExtractor
|
||||
|
||||
[[autodoc]] DetrFeatureExtractor
|
||||
- __call__
|
||||
- pad_and_create_pixel_mask
|
||||
- post_process
|
||||
- post_process_segmentation
|
||||
- post_process_panoptic
|
||||
|
||||
## DetrModel
|
||||
|
||||
[[autodoc]] DetrModel
|
||||
- forward
|
||||
|
||||
## DetrForObjectDetection
|
||||
|
||||
[[autodoc]] DetrForObjectDetection
|
||||
- forward
|
||||
|
||||
## DetrForSegmentation
|
||||
|
||||
[[autodoc]] DetrForSegmentation
|
||||
- forward
|
||||
@ -1,207 +0,0 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
DETR
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The DETR model was proposed in `End-to-End Object Detection with Transformers <https://arxiv.org/abs/2005.12872>`__ by
|
||||
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and Sergey Zagoruyko. DETR
|
||||
consists of a convolutional backbone followed by an encoder-decoder Transformer which can be trained end-to-end for
|
||||
object detection. It greatly simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use
|
||||
things like region proposals, non-maximum suppression procedure and anchor generation. Moreover, DETR can also be
|
||||
naturally extended to perform panoptic segmentation, by simply adding a mask head on top of the decoder outputs.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present a new method that views object detection as a direct set prediction problem. Our approach streamlines the
|
||||
detection pipeline, effectively removing the need for many hand-designed components like a non-maximum suppression
|
||||
procedure or anchor generation that explicitly encode our prior knowledge about the task. The main ingredients of the
|
||||
new framework, called DEtection TRansformer or DETR, are a set-based global loss that forces unique predictions via
|
||||
bipartite matching, and a transformer encoder-decoder architecture. Given a fixed small set of learned object queries,
|
||||
DETR reasons about the relations of the objects and the global image context to directly output the final set of
|
||||
predictions in parallel. The new model is conceptually simple and does not require a specialized library, unlike many
|
||||
other modern detectors. DETR demonstrates accuracy and run-time performance on par with the well-established and
|
||||
highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset. Moreover, DETR can be easily
|
||||
generalized to produce panoptic segmentation in a unified manner. We show that it significantly outperforms competitive
|
||||
baselines.*
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/facebookresearch/detr>`__.
|
||||
|
||||
The quickest way to get started with DETR is by checking the `example notebooks
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR>`__ (which showcase both inference and
|
||||
fine-tuning on custom data).
|
||||
|
||||
Here's a TLDR explaining how :class:`~transformers.DetrForObjectDetection` works:
|
||||
|
||||
First, an image is sent through a pre-trained convolutional backbone (in the paper, the authors use
|
||||
ResNet-50/ResNet-101). Let's assume we also add a batch dimension. This means that the input to the backbone is a
|
||||
tensor of shape :obj:`(batch_size, 3, height, width)`, assuming the image has 3 color channels (RGB). The CNN backbone
|
||||
outputs a new lower-resolution feature map, typically of shape :obj:`(batch_size, 2048, height/32, width/32)`. This is
|
||||
then projected to match the hidden dimension of the Transformer of DETR, which is :obj:`256` by default, using a
|
||||
:obj:`nn.Conv2D` layer. So now, we have a tensor of shape :obj:`(batch_size, 256, height/32, width/32).` Next, the
|
||||
feature map is flattened and transposed to obtain a tensor of shape :obj:`(batch_size, seq_len, d_model)` =
|
||||
:obj:`(batch_size, width/32*height/32, 256)`. So a difference with NLP models is that the sequence length is actually
|
||||
longer than usual, but with a smaller :obj:`d_model` (which in NLP is typically 768 or higher).
|
||||
|
||||
Next, this is sent through the encoder, outputting :obj:`encoder_hidden_states` of the same shape (you can consider
|
||||
these as image features). Next, so-called **object queries** are sent through the decoder. This is a tensor of shape
|
||||
:obj:`(batch_size, num_queries, d_model)`, with :obj:`num_queries` typically set to 100 and initialized with zeros.
|
||||
These input embeddings are learnt positional encodings that the authors refer to as object queries, and similarly to
|
||||
the encoder, they are added to the input of each attention layer. Each object query will look for a particular object
|
||||
in the image. The decoder updates these embeddings through multiple self-attention and encoder-decoder attention layers
|
||||
to output :obj:`decoder_hidden_states` of the same shape: :obj:`(batch_size, num_queries, d_model)`. Next, two heads
|
||||
are added on top for object detection: a linear layer for classifying each object query into one of the objects or "no
|
||||
object", and a MLP to predict bounding boxes for each query.
|
||||
|
||||
The model is trained using a **bipartite matching loss**: so what we actually do is compare the predicted classes +
|
||||
bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N
|
||||
(so if an image only contains 4 objects, 96 annotations will just have a "no object" as class and "no bounding box" as
|
||||
bounding box). The `Hungarian matching algorithm <https://en.wikipedia.org/wiki/Hungarian_algorithm>`__ is used to find
|
||||
an optimal one-to-one mapping of each of the N queries to each of the N annotations. Next, standard cross-entropy (for
|
||||
the classes) and a linear combination of the L1 and `generalized IoU loss <https://giou.stanford.edu/>`__ (for the
|
||||
bounding boxes) are used to optimize the parameters of the model.
|
||||
|
||||
DETR can be naturally extended to perform panoptic segmentation (which unifies semantic segmentation and instance
|
||||
segmentation). :class:`~transformers.DetrForSegmentation` adds a segmentation mask head on top of
|
||||
:class:`~transformers.DetrForObjectDetection`. The mask head can be trained either jointly, or in a two steps process,
|
||||
where one first trains a :class:`~transformers.DetrForObjectDetection` model to detect bounding boxes around both
|
||||
"things" (instances) and "stuff" (background things like trees, roads, sky), then freeze all the weights and train only
|
||||
the mask head for 25 epochs. Experimentally, these two approaches give similar results. Note that predicting boxes is
|
||||
required for the training to be possible, since the Hungarian matching is computed using distances between boxes.
|
||||
|
||||
Tips:
|
||||
|
||||
- DETR uses so-called **object queries** to detect objects in an image. The number of queries determines the maximum
|
||||
number of objects that can be detected in a single image, and is set to 100 by default (see parameter
|
||||
:obj:`num_queries` of :class:`~transformers.DetrConfig`). Note that it's good to have some slack (in COCO, the
|
||||
authors used 100, while the maximum number of objects in a COCO image is ~70).
|
||||
- The decoder of DETR updates the query embeddings in parallel. This is different from language models like GPT-2,
|
||||
which use autoregressive decoding instead of parallel. Hence, no causal attention mask is used.
|
||||
- DETR adds position embeddings to the hidden states at each self-attention and cross-attention layer before projecting
|
||||
to queries and keys. For the position embeddings of the image, one can choose between fixed sinusoidal or learned
|
||||
absolute position embeddings. By default, the parameter :obj:`position_embedding_type` of
|
||||
:class:`~transformers.DetrConfig` is set to :obj:`"sine"`.
|
||||
- During training, the authors of DETR did find it helpful to use auxiliary losses in the decoder, especially to help
|
||||
the model output the correct number of objects of each class. If you set the parameter :obj:`auxiliary_loss` of
|
||||
:class:`~transformers.DetrConfig` to :obj:`True`, then prediction feedforward neural networks and Hungarian losses
|
||||
are added after each decoder layer (with the FFNs sharing parameters).
|
||||
- If you want to train the model in a distributed environment across multiple nodes, then one should update the
|
||||
`num_boxes` variable in the `DetrLoss` class of `modeling_detr.py`. When training on multiple nodes, this should be
|
||||
set to the average number of target boxes across all nodes, as can be seen in the original implementation `here
|
||||
<https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/detr.py#L227-L232>`__.
|
||||
- :class:`~transformers.DetrForObjectDetection` and :class:`~transformers.DetrForSegmentation` can be initialized with
|
||||
any convolutional backbone available in the `timm library <https://github.com/rwightman/pytorch-image-models>`__.
|
||||
Initializing with a MobileNet backbone for example can be done by setting the :obj:`backbone` attribute of
|
||||
:class:`~transformers.DetrConfig` to :obj:`"tf_mobilenetv3_small_075"`, and then initializing the model with that
|
||||
config.
|
||||
- DETR resizes the input images such that the shortest side is at least a certain amount of pixels while the longest is
|
||||
at most 1333 pixels. At training time, scale augmentation is used such that the shortest side is randomly set to at
|
||||
least 480 and at most 800 pixels. At inference time, the shortest side is set to 800. One can use
|
||||
:class:`~transformers.DetrFeatureExtractor` to prepare images (and optional annotations in COCO format) for the
|
||||
model. Due to this resizing, images in a batch can have different sizes. DETR solves this by padding images up to the
|
||||
largest size in a batch, and by creating a pixel mask that indicates which pixels are real/which are padding.
|
||||
Alternatively, one can also define a custom :obj:`collate_fn` in order to batch images together, using
|
||||
:meth:`~transformers.DetrFeatureExtractor.pad_and_create_pixel_mask`.
|
||||
- The size of the images will determine the amount of memory being used, and will thus determine the :obj:`batch_size`.
|
||||
It is advised to use a batch size of 2 per GPU. See `this Github thread
|
||||
<https://github.com/facebookresearch/detr/issues/150>`__ for more info.
|
||||
|
||||
As a summary, consider the following table:
|
||||
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Task** | **Object detection** | **Instance segmentation** | **Panoptic segmentation** |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Description** | Predicting bounding boxes and class labels around | Predicting masks around objects (i.e. instances) in an image | Predicting masks around both objects (i.e. instances) as well as |
|
||||
| | objects in an image | | "stuff" (i.e. background things like trees and roads) in an image |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Model** | :class:`~transformers.DetrForObjectDetection` | :class:`~transformers.DetrForSegmentation` | :class:`~transformers.DetrForSegmentation` |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Example dataset** | COCO detection | COCO detection, | COCO panoptic |
|
||||
| | | COCO panoptic | |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Format of annotations to provide to** | {‘image_id’: int, | {‘image_id’: int, | {‘file_name: str, |
|
||||
| :class:`~transformers.DetrFeatureExtractor` | ‘annotations’: List[Dict]}, each Dict being a COCO | ‘annotations’: [List[Dict]] } (in case of COCO detection) | ‘image_id: int, |
|
||||
| | object annotation | | ‘segments_info’: List[Dict] } |
|
||||
| | | or | |
|
||||
| | | | and masks_path (path to directory containing PNG files of the masks) |
|
||||
| | | {‘file_name’: str, | |
|
||||
| | | ‘image_id’: int, | |
|
||||
| | | ‘segments_info’: List[Dict]} (in case of COCO panoptic) | |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **Postprocessing** (i.e. converting the | :meth:`~transformers.DetrFeatureExtractor.post_process` | :meth:`~transformers.DetrFeatureExtractor.post_process_segmentation` | :meth:`~transformers.DetrFeatureExtractor.post_process_segmentation`, |
|
||||
| output of the model to COCO API) | | | :meth:`~transformers.DetrFeatureExtractor.post_process_panoptic` |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
| **evaluators** | :obj:`CocoEvaluator` with iou_types = “bbox” | :obj:`CocoEvaluator` with iou_types = “bbox”, “segm” | :obj:`CocoEvaluator` with iou_tupes = “bbox, “segm” |
|
||||
| | | | |
|
||||
| | | | :obj:`PanopticEvaluator` |
|
||||
+---------------------------------------------+---------------------------------------------------------+----------------------------------------------------------------------+------------------------------------------------------------------------+
|
||||
|
||||
In short, one should prepare the data either in COCO detection or COCO panoptic format, then use
|
||||
:class:`~transformers.DetrFeatureExtractor` to create :obj:`pixel_values`, :obj:`pixel_mask` and optional
|
||||
:obj:`labels`, which can then be used to train (or fine-tune) a model. For evaluation, one should first convert the
|
||||
outputs of the model using one of the postprocessing methods of :class:`~transformers.DetrFeatureExtractor`. These can
|
||||
be be provided to either :obj:`CocoEvaluator` or :obj:`PanopticEvaluator`, which allow you to calculate metrics like
|
||||
mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are implemented in the `original repository
|
||||
<https://github.com/facebookresearch/detr>`__. See the `example notebooks
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/DETR>`__ for more info regarding evaluation.
|
||||
|
||||
|
||||
DETR specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.models.detr.modeling_detr.DetrModelOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.detr.modeling_detr.DetrObjectDetectionOutput
|
||||
:members:
|
||||
|
||||
.. autoclass:: transformers.models.detr.modeling_detr.DetrSegmentationOutput
|
||||
:members:
|
||||
|
||||
|
||||
DetrConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DetrConfig
|
||||
:members:
|
||||
|
||||
|
||||
DetrFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DetrFeatureExtractor
|
||||
:members: __call__, pad_and_create_pixel_mask, post_process, post_process_segmentation, post_process_panoptic
|
||||
|
||||
|
||||
DetrModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DetrModel
|
||||
:members: forward
|
||||
|
||||
|
||||
DetrForObjectDetection
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DetrForObjectDetection
|
||||
:members: forward
|
||||
|
||||
|
||||
DetrForSegmentation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DetrForSegmentation
|
||||
:members: forward
|
||||
@ -44,8 +44,9 @@ Tips:
|
||||
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
|
||||
necessary though, just let us know if you need this option.
|
||||
|
||||
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found
|
||||
:prefix_link:`here <examples/research-projects/distillation>`.
|
||||
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. This model jax version was
|
||||
contributed by `kamalkraj <https://huggingface.co/kamalkraj>`__. The original code can be found :prefix_link:`here
|
||||
<examples/research_projects/distillation>`.
|
||||
|
||||
|
||||
DistilBertConfig
|
||||
@ -152,3 +153,45 @@ TFDistilBertForQuestionAnswering
|
||||
|
||||
.. autoclass:: transformers.TFDistilBertForQuestionAnswering
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxDistilBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForMaskedLM
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForSequenceClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForMultipleChoice
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForTokenClassification
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxDistilBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxDistilBertForQuestionAnswering
|
||||
:members: __call__
|
||||
|
||||
@ -27,6 +27,25 @@ An application of this architecture could be to leverage two pretrained :class:`
|
||||
and decoder for a summarization model as was shown in: `Text Summarization with Pretrained Encoders
|
||||
<https://arxiv.org/abs/1908.08345>`__ by Yang Liu and Mirella Lapata.
|
||||
|
||||
The :meth:`~transformers.TFEncoderDecoderModel.from_pretrained` currently doesn't support initializing the model from a
|
||||
pytorch checkpoint. Passing ``from_pt=True`` to this method will throw an exception. If there are only pytorch
|
||||
checkpoints for a particular encoder-decoder model, a workaround is:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # a workaround to load from pytorch checkpoint
|
||||
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
|
||||
>>> _model.encoder.save_pretrained("./encoder")
|
||||
>>> _model.decoder.save_pretrained("./decoder")
|
||||
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
|
||||
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
|
||||
... )
|
||||
>>> # This is only for copying some specific attributes of this particular model.
|
||||
>>> model.config = _model.config
|
||||
|
||||
This model was contributed by `thomwolf <https://github.com/thomwolf>`__. This model's TensorFlow and Flax versions
|
||||
were contributed by `ydshieh <https://github.com/ydshieh>`__.
|
||||
|
||||
|
||||
EncoderDecoderConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -40,3 +59,17 @@ EncoderDecoderModel
|
||||
|
||||
.. autoclass:: transformers.EncoderDecoderModel
|
||||
:members: forward, from_encoder_decoder_pretrained
|
||||
|
||||
|
||||
TFEncoderDecoderModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFEncoderDecoderModel
|
||||
:members: call, from_encoder_decoder_pretrained
|
||||
|
||||
|
||||
FlaxEncoderDecoderModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxEncoderDecoderModel
|
||||
:members: __call__, from_encoder_decoder_pretrained
|
||||
|
||||
121
docs/source/model_doc/fnet.rst
Normal file
121
docs/source/model_doc/fnet.rst
Normal file
@ -0,0 +1,121 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
FNet
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The FNet model was proposed in `FNet: Mixing Tokens with Fourier Transforms <https://arxiv.org/abs/2105.03824>`__ by
|
||||
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. The model replaces the self-attention layer in a BERT
|
||||
model with a fourier transform which returns only the real parts of the transform. The model is significantly faster
|
||||
than the BERT model because it has fewer parameters and is more memory efficient. The model achieves about 92-97%
|
||||
accuracy of BERT counterparts on GLUE benchmark, and trains much faster than the BERT model. The abstract from the
|
||||
paper is the following:
|
||||
|
||||
*We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the
|
||||
self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with
|
||||
standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text
|
||||
classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder
|
||||
with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE
|
||||
benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths,
|
||||
our FNet model is significantly faster: when compared to the "efficient" Transformers on the Long Range Arena
|
||||
benchmark, FNet matches the accuracy of the most accurate models, while outpacing the fastest models across all
|
||||
sequence lengths on GPUs (and across relatively shorter lengths on TPUs). Finally, FNet has a light memory footprint
|
||||
and is particularly efficient at smaller model sizes; for a fixed speed and accuracy budget, small FNet models
|
||||
outperform Transformer counterparts.*
|
||||
|
||||
Tips on usage:
|
||||
|
||||
- The model was trained without an attention mask as it is based on Fourier Transform. The model was trained with
|
||||
maximum sequence length 512 which includes pad tokens. Hence, it is highly recommended to use the same maximum
|
||||
sequence length for fine-tuning and inference.
|
||||
|
||||
This model was contributed by `gchhablani <https://huggingface.co/gchhablani>`__. The original code can be found `here
|
||||
<https://github.com/google-research/google-research/tree/master/f_net>`__.
|
||||
|
||||
FNetConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetConfig
|
||||
:members:
|
||||
|
||||
|
||||
FNetTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
FNetTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
FNetModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetModel
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForPreTraining
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForPreTraining
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForNextSentencePrediction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForNextSentencePrediction
|
||||
:members: forward
|
||||
|
||||
FNetForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
FNetForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FNetForQuestionAnswering
|
||||
:members: forward
|
||||
@ -36,10 +36,13 @@ Tips:
|
||||
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
|
||||
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
|
||||
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
||||
this argument.
|
||||
- The model can take the `past_key_values` (for PyTorch) or `past` (for TF) as input, which is the previously computed
|
||||
key/value attention pairs. Using this (`past_key_values` or `past`) value prevents the model from re-computing
|
||||
pre-computed values in the context of text generation. For PyTorch, see `past_key_values` argument of the
|
||||
:meth:`~transformers.GPT2Model.forward` method, or for TF the `past` argument of the
|
||||
:meth:`~transformers.TFGPT2Model.call` method for more information on its usage.
|
||||
- Enabling the `scale_attn_by_inverse_layer_idx` and `reorder_and_upcast_attn` flags will apply the training stability
|
||||
improvements from `Mistral <https://github.com/stanford-crfm/mistral/>`__ (for PyTorch only).
|
||||
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
@ -108,6 +111,13 @@ GPT2ForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
GPT2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2ForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFGPT2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
142
docs/source/model_doc/gptj.rst
Normal file
142
docs/source/model_doc/gptj.rst
Normal file
@ -0,0 +1,142 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
GPT-J
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The GPT-J model was released in the `kingoflolz/mesh-transformer-jax
|
||||
<https://github.com/kingoflolz/mesh-transformer-jax>`__ repository by Ben Wang and Aran Komatsuzaki. It is a GPT-2-like
|
||||
causal language model trained on `the Pile <https://pile.eleuther.ai/>`__ dataset.
|
||||
|
||||
This model was contributed by `Stella Biderman <https://huggingface.co/stellaathena>`__.
|
||||
|
||||
Tips:
|
||||
|
||||
- To load `GPT-J <https://huggingface.co/EleutherAI/gpt-j-6B>`__ in float32 one would need at least 2x model size CPU
|
||||
RAM: 1x for initial weights and another 1x to load the checkpoint. So for GPT-J it would take at least 48GB of CPU
|
||||
RAM to just load the model. To reduce the CPU RAM usage there are a few options. The ``torch_dtype`` argument can be
|
||||
used to initialize the model in half-precision. And the ``low_cpu_mem_usage`` argument can be used to keep the RAM
|
||||
usage to 1x. There is also a `fp16 branch <https://huggingface.co/EleutherAI/gpt-j-6B/tree/float16>`__ which stores
|
||||
the fp16 weights, which could be used to further minimize the RAM usage. Combining all this it should take roughly
|
||||
12.1GB of CPU RAM to load the model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import GPTJForCausalLM
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16, low_cpu_mem_usage=True)
|
||||
|
||||
|
||||
- The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. Adam
|
||||
optimizer for example makes four copies of the model: model, gradients, average and squared average of the gradients.
|
||||
So it would need at least 4x model size GPU memory, even with mixed precision as gradient updates are in fp32. This
|
||||
is not including the activations and data batches, which would again require some more GPU RAM. So one should explore
|
||||
solutions such as DeepSpeed, to train/fine-tune the model. Another option is to use the original codebase to
|
||||
train/fine-tune the model on TPU and then convert the model to Transformers format for inference. Instructions for
|
||||
that could be found `here <https://github.com/kingoflolz/mesh-transformer-jax/blob/master/howto_finetune.md>`__
|
||||
|
||||
- Although the embedding matrix has a size of 50400, only 50257 entries are used by the GPT-2 tokenizer. These extra
|
||||
tokens are added for the sake of efficiency on TPUs. To avoid the mis-match between embedding matrix size and vocab
|
||||
size, the tokenizer for `GPT-J <https://huggingface.co/EleutherAI/gpt-j-6B>`__ contains 143 extra tokens
|
||||
``<|extratoken_1|>... <|extratoken_143|>``, so the ``vocab_size`` of tokenizer also becomes 50400.
|
||||
|
||||
Generation
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
The :meth:`~transformers.generation_utils.GenerationMixin.generate` method can be used to generate text using GPT-J
|
||||
model.
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
|
||||
...or in float16 precision:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from transformers import GPTJForCausalLM, AutoTokenizer
|
||||
>>> import torch
|
||||
|
||||
>>> model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", torch_dtype=torch.float16)
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
|
||||
|
||||
>>> prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, " \
|
||||
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the " \
|
||||
... "researchers was the fact that the unicorns spoke perfect English."
|
||||
|
||||
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
|
||||
|
||||
>>> gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
|
||||
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
|
||||
|
||||
|
||||
GPTJConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJConfig
|
||||
:members:
|
||||
|
||||
GPTJModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJModel
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
GPTJForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPTJForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
FlaxGPTJModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTJModel
|
||||
:members: __call__
|
||||
|
||||
|
||||
FlaxGPTJForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxGPTJForCausalLM
|
||||
:members: __call__
|
||||
@ -10,13 +10,13 @@
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
herBERT
|
||||
HerBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The herBERT model was proposed in `KLEJ: Comprehensive Benchmark for Polish Language Understanding
|
||||
The HerBERT model was proposed in `KLEJ: Comprehensive Benchmark for Polish Language Understanding
|
||||
<https://www.aclweb.org/anthology/2020.acl-main.111.pdf>`__ by Piotr Rybak, Robert Mroczkowski, Janusz Tracz, and
|
||||
Ireneusz Gawlik. It is a BERT-based Language Model trained on Polish Corpora using only MLM objective with dynamic
|
||||
masking of whole words.
|
||||
|
||||
@ -64,6 +64,14 @@ HubertForCTC
|
||||
.. autoclass:: transformers.HubertForCTC
|
||||
:members: forward
|
||||
|
||||
|
||||
HubertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.HubertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
TFHubertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
100
docs/source/model_doc/imagegpt.mdx
Normal file
100
docs/source/model_doc/imagegpt.mdx
Normal file
@ -0,0 +1,100 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
|
||||
License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License. -->
|
||||
|
||||
# ImageGPT
|
||||
|
||||
## Overview
|
||||
|
||||
The ImageGPT model was proposed in [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt) by Mark
|
||||
Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like
|
||||
model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models
|
||||
can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels,
|
||||
without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels,
|
||||
we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and
|
||||
low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide
|
||||
ResNet, and 99.0% accuracy with full fine-tuning, matching the top supervised pre-trained models. We are also
|
||||
competitive with self-supervised benchmarks on ImageNet when substituting pixels for a VQVAE encoding, achieving 69.0%
|
||||
top-1 accuracy on a linear probe of our features.*
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/imagegpt_architecture.png"
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> Summary of the approach. Taken from the [original paper](https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf). </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr), based on [this issue](https://github.com/openai/image-gpt/issues/7). The original code can be found
|
||||
[here](https://github.com/openai/image-gpt).
|
||||
|
||||
Tips:
|
||||
|
||||
- Demo notebooks for ImageGPT can be found
|
||||
[here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/ImageGPT).
|
||||
- ImageGPT is almost exactly the same as [GPT-2](./model_doc/gpt2), with the exception that a different activation
|
||||
function is used (namely "quick gelu"), and the layer normalization layers don't mean center the inputs. ImageGPT
|
||||
also doesn't have tied input- and output embeddings.
|
||||
- As the time- and memory requirements of the attention mechanism of Transformers scales quadratically in the sequence
|
||||
length, the authors pre-trained ImageGPT on smaller input resolutions, such as 32x32 and 64x64. However, feeding a
|
||||
sequence of 32x32x3=3072 tokens from 0..255 into a Transformer is still prohibitively large. Therefore, the authors
|
||||
applied k-means clustering to the (R,G,B) pixel values with k=512. This way, we only have a 32*32 = 1024-long
|
||||
sequence, but now of integers in the range 0..511. So we are shrinking the sequence length at the cost of a bigger
|
||||
embedding matrix. In other words, the vocabulary size of ImageGPT is 512, + 1 for a special "start of sentence" (SOS)
|
||||
token, used at the beginning of every sequence. One can use [`ImageGPTFeatureExtractor`] to prepare
|
||||
images for the model.
|
||||
- Despite being pre-trained entirely unsupervised (i.e. without the use of any labels), ImageGPT produces fairly
|
||||
performant image features useful for downstream tasks, such as image classification. The authors showed that the
|
||||
features in the middle of the network are the most performant, and can be used as-is to train a linear model (such as
|
||||
a sklearn logistic regression model for example). This is also referred to as "linear probing". Features can be
|
||||
easily obtained by first forwarding the image through the model, then specifying `output_hidden_states=True`, and
|
||||
then average-pool the hidden states at whatever layer you like.
|
||||
- Alternatively, one can further fine-tune the entire model on a downstream dataset, similar to BERT. For this, you can
|
||||
use [`ImageGPTForImageClassification`].
|
||||
- ImageGPT comes in different sizes: there's ImageGPT-small, ImageGPT-medium and ImageGPT-large. The authors did also
|
||||
train an XL variant, which they didn't release. The differences in size are summarized in the following table:
|
||||
|
||||
| **Model variant** | **Depths** | **Hidden sizes** | **Decoder hidden size** | **Params (M)** | **ImageNet-1k Top 1** |
|
||||
|---|---|---|---|---|---|
|
||||
| MiT-b0 | [2, 2, 2, 2] | [32, 64, 160, 256] | 256 | 3.7 | 70.5 |
|
||||
| MiT-b1 | [2, 2, 2, 2] | [64, 128, 320, 512] | 256 | 14.0 | 78.7 |
|
||||
| MiT-b2 | [3, 4, 6, 3] | [64, 128, 320, 512] | 768 | 25.4 | 81.6 |
|
||||
| MiT-b3 | [3, 4, 18, 3] | [64, 128, 320, 512] | 768 | 45.2 | 83.1 |
|
||||
| MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 |
|
||||
| MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 |
|
||||
|
||||
## ImageGPTConfig
|
||||
|
||||
[[autodoc]] ImageGPTConfig
|
||||
|
||||
## ImageGPTFeatureExtractor
|
||||
|
||||
[[autodoc]] ImageGPTFeatureExtractor
|
||||
|
||||
- __call__
|
||||
|
||||
## ImageGPTModel
|
||||
|
||||
[[autodoc]] ImageGPTModel
|
||||
|
||||
- forward
|
||||
|
||||
## ImageGPTForCausalImageModeling
|
||||
|
||||
[[autodoc]] ImageGPTForCausalImageModeling
|
||||
|
||||
- forward
|
||||
|
||||
## ImageGPTForImageClassification
|
||||
|
||||
[[autodoc]] ImageGPTForImageClassification
|
||||
|
||||
- forward
|
||||
313
docs/source/model_doc/layoutlmv2.rst
Normal file
313
docs/source/model_doc/layoutlmv2.rst
Normal file
@ -0,0 +1,313 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LayoutLMV2
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The LayoutLMV2 model was proposed in `LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
|
||||
<https://arxiv.org/abs/2012.14740>`__ by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu,
|
||||
Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou. LayoutLMV2 improves `LayoutLM <layoutlm>`__ to obtain
|
||||
state-of-the-art results across several document image understanding benchmarks:
|
||||
|
||||
- information extraction from scanned documents: the `FUNSD <https://guillaumejaume.github.io/FUNSD/>`__ dataset (a
|
||||
collection of 199 annotated forms comprising more than 30,000 words), the `CORD <https://github.com/clovaai/cord>`__
|
||||
dataset (a collection of 800 receipts for training, 100 for validation and 100 for testing), the `SROIE
|
||||
<https://rrc.cvc.uab.es/?ch=13>`__ dataset (a collection of 626 receipts for training and 347 receipts for testing)
|
||||
and the `Kleister-NDA <https://github.com/applicaai/kleister-nda>`__ dataset (a collection of non-disclosure
|
||||
agreements from the EDGAR database, including 254 documents for training, 83 documents for validation, and 203
|
||||
documents for testing).
|
||||
- document image classification: the `RVL-CDIP <https://www.cs.cmu.edu/~aharley/rvl-cdip/>`__ dataset (a collection of
|
||||
400,000 images belonging to one of 16 classes).
|
||||
- document visual question answering: the `DocVQA <https://arxiv.org/abs/2007.00398>`__ dataset (a collection of 50,000
|
||||
questions defined on 12,000+ document images).
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to
|
||||
its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents. In this
|
||||
paper, we present LayoutLMv2 by pre-training text, layout and image in a multi-modal framework, where new model
|
||||
architectures and pre-training tasks are leveraged. Specifically, LayoutLMv2 not only uses the existing masked
|
||||
visual-language modeling task but also the new text-image alignment and text-image matching tasks in the pre-training
|
||||
stage, where cross-modality interaction is better learned. Meanwhile, it also integrates a spatial-aware self-attention
|
||||
mechanism into the Transformer architecture, so that the model can fully understand the relative positional
|
||||
relationship among different text blocks. Experiment results show that LayoutLMv2 outperforms strong baselines and
|
||||
achieves new state-of-the-art results on a wide variety of downstream visually-rich document understanding tasks,
|
||||
including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.9781), Kleister-NDA (0.834 -> 0.852),
|
||||
RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). The pre-trained LayoutLMv2 model is publicly available at
|
||||
this https URL.*
|
||||
|
||||
Tips:
|
||||
|
||||
- The main difference between LayoutLMv1 and LayoutLMv2 is that the latter incorporates visual embeddings during
|
||||
pre-training (while LayoutLMv1 only adds visual embeddings during fine-tuning).
|
||||
- LayoutLMv2 adds both a relative 1D attention bias as well as a spatial 2D attention bias to the attention scores in
|
||||
the self-attention layers. Details can be found on page 5 of the `paper <https://arxiv.org/abs/2012.14740>`__.
|
||||
- Demo notebooks on how to use the LayoutLMv2 model on RVL-CDIP, FUNSD, DocVQA, CORD can be found `here
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials>`__.
|
||||
- LayoutLMv2 uses Facebook AI's `Detectron2 <https://github.com/facebookresearch/detectron2/>`__ package for its visual
|
||||
backbone. See `this link <https://detectron2.readthedocs.io/en/latest/tutorials/install.html>`__ for installation
|
||||
instructions.
|
||||
- In addition to :obj:`input_ids`, :meth:`~transformer.LayoutLMv2Model.forward` expects 2 additional inputs, namely
|
||||
:obj:`image` and :obj:`bbox`. The :obj:`image` input corresponds to the original document image in which the text
|
||||
tokens occur. The model expects each document image to be of size 224x224. This means that if you have a batch of
|
||||
document images, :obj:`image` should be a tensor of shape (batch_size, 3, 224, 224). This can be either a
|
||||
:obj:`torch.Tensor` or a :obj:`Detectron2.structures.ImageList`. You don't need to normalize the channels, as this is
|
||||
done by the model. Important to note is that the visual backbone expects BGR channels instead of RGB, as all models
|
||||
in Detectron2 are pre-trained using the BGR format. The :obj:`bbox` input are the bounding boxes (i.e. 2D-positions)
|
||||
of the input text tokens. This is identical to :class:`~transformer.LayoutLMModel`. These can be obtained using an
|
||||
external OCR engine such as Google's `Tesseract <https://github.com/tesseract-ocr/tesseract>`__ (there's a `Python
|
||||
wrapper <https://pypi.org/project/pytesseract/>`__ available). Each bounding box should be in (x0, y0, x1, y1)
|
||||
format, where (x0, y0) corresponds to the position of the upper left corner in the bounding box, and (x1, y1)
|
||||
represents the position of the lower right corner. Note that one first needs to normalize the bounding boxes to be on
|
||||
a 0-1000 scale. To normalize, you can use the following function:
|
||||
|
||||
.. code-block::
|
||||
|
||||
def normalize_bbox(bbox, width, height):
|
||||
return [
|
||||
int(1000 * (bbox[0] / width)),
|
||||
int(1000 * (bbox[1] / height)),
|
||||
int(1000 * (bbox[2] / width)),
|
||||
int(1000 * (bbox[3] / height)),
|
||||
]
|
||||
|
||||
Here, :obj:`width` and :obj:`height` correspond to the width and height of the original document in which the token
|
||||
occurs (before resizing the image). Those can be obtained using the Python Image Library (PIL) library for example, as
|
||||
follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from PIL import Image
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
|
||||
|
||||
width, height = image.size
|
||||
|
||||
However, this model includes a brand new :class:`~transformer.LayoutLMv2Processor` which can be used to directly
|
||||
prepare data for the model (including applying OCR under the hood). More information can be found in the "Usage"
|
||||
section below.
|
||||
|
||||
- Internally, :class:`~transformer.LayoutLMv2Model` will send the :obj:`image` input through its visual backbone to
|
||||
obtain a lower-resolution feature map, whose shape is equal to the :obj:`image_feature_pool_shape` attribute of
|
||||
:class:`~transformer.LayoutLMv2Config`. This feature map is then flattened to obtain a sequence of image tokens. As
|
||||
the size of the feature map is 7x7 by default, one obtains 49 image tokens. These are then concatenated with the text
|
||||
tokens, and send through the Transformer encoder. This means that the last hidden states of the model will have a
|
||||
length of 512 + 49 = 561, if you pad the text tokens up to the max length. More generally, the last hidden states
|
||||
will have a shape of :obj:`seq_length` + :obj:`image_feature_pool_shape[0]` *
|
||||
:obj:`config.image_feature_pool_shape[1]`.
|
||||
- When calling :meth:`~transformer.LayoutLMv2Model.from_pretrained`, a warning will be printed with a long list of
|
||||
parameter names that are not initialized. This is not a problem, as these parameters are batch normalization
|
||||
statistics, which are going to have values when fine-tuning on a custom dataset.
|
||||
- If you want to train the model in a distributed environment, make sure to call :meth:`synchronize_batch_norm` on the
|
||||
model in order to properly synchronize the batch normalization layers of the visual backbone.
|
||||
|
||||
In addition, there's LayoutXLM, which is a multilingual version of LayoutLMv2. More information can be found on
|
||||
:doc:`LayoutXLM's documentation page <layoutxlm>`.
|
||||
|
||||
Usage: LayoutLMv2Processor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The easiest way to prepare data for the model is to use :class:`~transformer.LayoutLMv2Processor`, which internally
|
||||
combines a feature extractor (:class:`~transformer.LayoutLMv2FeatureExtractor`) and a tokenizer
|
||||
(:class:`~transformer.LayoutLMv2Tokenizer` or :class:`~transformer.LayoutLMv2TokenizerFast`). The feature extractor
|
||||
handles the image modality, while the tokenizer handles the text modality. A processor combines both, which is ideal
|
||||
for a multi-modal model like LayoutLMv2. Note that you can still use both separately, if you only want to handle one
|
||||
modality.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2FeatureExtractor, LayoutLMv2TokenizerFast, LayoutLMv2Processor
|
||||
|
||||
feature_extractor = LayoutLMv2FeatureExtractor() # apply_ocr is set to True by default
|
||||
tokenizer = LayoutLMv2TokenizerFast.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
processor = LayoutLMv2Processor(feature_extractor, tokenizer)
|
||||
|
||||
In short, one can provide a document image (and possibly additional data) to :class:`~transformer.LayoutLMv2Processor`,
|
||||
and it will create the inputs expected by the model. Internally, the processor first uses
|
||||
:class:`~transformer.LayoutLMv2FeatureExtractor` to apply OCR on the image to get a list of words and normalized
|
||||
bounding boxes, as well to resize the image to a given size in order to get the :obj:`image` input. The words and
|
||||
normalized bounding boxes are then provided to :class:`~transformer.LayoutLMv2Tokenizer` or
|
||||
:class:`~transformer.LayoutLMv2TokenizerFast`, which converts them to token-level :obj:`input_ids`,
|
||||
:obj:`attention_mask`, :obj:`token_type_ids`, :obj:`bbox`. Optionally, one can provide word labels to the processor,
|
||||
which are turned into token-level :obj:`labels`.
|
||||
|
||||
:class:`~transformer.LayoutLMv2Processor` uses `PyTesseract <https://pypi.org/project/pytesseract/>`__, a Python
|
||||
wrapper around Google's Tesseract OCR engine, under the hood. Note that you can still use your own OCR engine of
|
||||
choice, and provide the words and normalized boxes yourself. This requires initializing
|
||||
:class:`~transformer.LayoutLMv2FeatureExtractor` with :obj:`apply_ocr` set to :obj:`False`.
|
||||
|
||||
In total, there are 5 use cases that are supported by the processor. Below, we list them all. Note that each of these
|
||||
use cases work for both batched and non-batched inputs (we illustrate them for non-batched inputs).
|
||||
|
||||
**Use case 1: document image classification (training, inference) + token classification (inference), apply_ocr =
|
||||
True**
|
||||
|
||||
This is the simplest case, in which the processor (actually the feature extractor) will perform OCR on the image to get
|
||||
the words and normalized bounding boxes.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
encoding = processor(image, return_tensors="pt") # you can also add all tokenizer parameters here such as padding, truncation
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 2: document image classification (training, inference) + token classification (inference), apply_ocr=False**
|
||||
|
||||
In case one wants to do OCR themselves, one can initialize the feature extractor with :obj:`apply_ocr` set to
|
||||
:obj:`False`. In that case, one should provide the words and corresponding (normalized) bounding boxes themselves to
|
||||
the processor.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 3: token classification (training), apply_ocr=False**
|
||||
|
||||
For token classification tasks (such as FUNSD, CORD, SROIE, Kleister-NDA), one can also provide the corresponding word
|
||||
labels in order to train a model. The processor will then convert these into token-level :obj:`labels`. By default, it
|
||||
will only label the first wordpiece of a word, and label the remaining wordpieces with -100, which is the
|
||||
:obj:`ignore_index` of PyTorch's CrossEntropyLoss. In case you want all wordpieces of a word to be labeled, you can
|
||||
initialize the tokenizer with :obj:`only_label_first_subword` set to :obj:`False`.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
word_labels = [1, 2]
|
||||
encoding = processor(image, words, boxes=boxes, word_labels=word_labels, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'labels', 'image'])
|
||||
|
||||
**Use case 4: visual question answering (inference), apply_ocr=True**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. By default, the
|
||||
processor will apply OCR on the image, and create [CLS] question tokens [SEP] word tokens [SEP].
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
encoding = processor(image, question, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
**Use case 5: visual question answering (inference), apply_ocr=False**
|
||||
|
||||
For visual question answering tasks (such as DocVQA), you can provide a question to the processor. If you want to
|
||||
perform OCR yourself, you can provide your own words and (normalized) bounding boxes to the processor.
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Processor
|
||||
from PIL import Image
|
||||
|
||||
processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased", revision="no_ocr")
|
||||
|
||||
image = Image.open("name_of_your_document - can be a png file, pdf, etc.").convert("RGB")
|
||||
question = "What's his name?"
|
||||
words = ["hello", "world"]
|
||||
boxes = [[1, 2, 3, 4], [5, 6, 7, 8]] # make sure to normalize your bounding boxes
|
||||
encoding = processor(image, question, words, boxes=boxes, return_tensors="pt")
|
||||
print(encoding.keys())
|
||||
# dict_keys(['input_ids', 'token_type_ids', 'attention_mask', 'bbox', 'image'])
|
||||
|
||||
LayoutLMv2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Config
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2FeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2FeatureExtractor
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Tokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Tokenizer
|
||||
:members: __call__, save_vocabulary
|
||||
|
||||
|
||||
LayoutLMv2TokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2TokenizerFast
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Processor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Processor
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutLMv2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2Model
|
||||
:members: forward
|
||||
|
||||
|
||||
LayoutLMv2ForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForSequenceClassification
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2ForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForTokenClassification
|
||||
:members:
|
||||
|
||||
|
||||
LayoutLMv2ForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutLMv2ForQuestionAnswering
|
||||
:members:
|
||||
84
docs/source/model_doc/layoutxlm.rst
Normal file
84
docs/source/model_doc/layoutxlm.rst
Normal file
@ -0,0 +1,84 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
LayoutXLM
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
LayoutXLM was proposed in `LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
|
||||
<https://arxiv.org/abs/2104.08836>`__ by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha
|
||||
Zhang, Furu Wei. It's a multilingual extension of the `LayoutLMv2 model <https://arxiv.org/abs/2012.14740>`__ trained
|
||||
on 53 languages.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually-rich document
|
||||
understanding tasks recently, which demonstrates the great potential for joint learning across different modalities. In
|
||||
this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to
|
||||
bridge the language barriers for visually-rich document understanding. To accurately evaluate LayoutXLM, we also
|
||||
introduce a multilingual form understanding benchmark dataset named XFUN, which includes form understanding samples in
|
||||
7 languages (Chinese, Japanese, Spanish, French, Italian, German, Portuguese), and key-value pairs are manually labeled
|
||||
for each language. Experiment results show that the LayoutXLM model has significantly outperformed the existing SOTA
|
||||
cross-lingual pre-trained models on the XFUN dataset.*
|
||||
|
||||
One can directly plug in the weights of LayoutXLM into a LayoutLMv2 model, like so:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutLMv2Model
|
||||
|
||||
model = LayoutLMv2Model.from_pretrained('microsoft/layoutxlm-base')
|
||||
|
||||
Note that LayoutXLM has its own tokenizer, based on
|
||||
:class:`~transformers.LayoutXLMTokenizer`/:class:`~transformers.LayoutXLMTokenizerFast`. You can initialize it as
|
||||
follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LayoutXLMTokenizer
|
||||
|
||||
tokenizer = LayoutXLMTokenizer.from_pretrained('microsoft/layoutxlm-base')
|
||||
|
||||
Similar to LayoutLMv2, you can use :class:`~transformers.LayoutXLMProcessor` (which internally applies
|
||||
:class:`~transformers.LayoutLMv2FeatureExtractor` and
|
||||
:class:`~transformers.LayoutXLMTokenizer`/:class:`~transformers.LayoutXLMTokenizerFast` in sequence) to prepare all
|
||||
data for the model.
|
||||
|
||||
As LayoutXLM's architecture is equivalent to that of LayoutLMv2, one can refer to :doc:`LayoutLMv2's documentation page
|
||||
<layoutlmv2>` for all tips, code examples and notebooks.
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/microsoft/unilm>`__.
|
||||
|
||||
|
||||
LayoutXLMTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMTokenizer
|
||||
:members: __call__, build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
LayoutXLMTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMTokenizerFast
|
||||
:members: __call__
|
||||
|
||||
|
||||
LayoutXLMProcessor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LayoutXLMProcessor
|
||||
:members: __call__
|
||||
@ -46,8 +46,8 @@ Tips:
|
||||
- LED makes use of *global attention* by means of the ``global_attention_mask`` (see
|
||||
:class:`~transformers.LongformerModel`). For summarization, it is advised to put *global attention* only on the first
|
||||
``<s>`` token. For question answering, it is advised to put *global attention* on all tokens of the question.
|
||||
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by setting
|
||||
``config.gradient_checkpointing = True``.
|
||||
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by executing
|
||||
``model.gradient_checkpointing_enable()``.
|
||||
- A notebook showing how to evaluate LED, can be accessed `here
|
||||
<https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing>`__.
|
||||
- A notebook showing how to fine-tune LED, can be accessed `here
|
||||
|
||||
@ -74,6 +74,9 @@ Tips:
|
||||
head models by specifying ``task="entity_classification"``, ``task="entity_pair_classification"``, or
|
||||
``task="entity_span_classification"``. Please refer to the example code of each head models.
|
||||
|
||||
A demo notebook on how to fine-tune :class:`~transformers.LukeForEntityPairClassification` for relation
|
||||
classification can be found `here <https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LUKE>`__.
|
||||
|
||||
There are also 3 notebooks available, which showcase how you can reproduce the results as reported in the paper with
|
||||
the HuggingFace implementation of LUKE. They can be found `here
|
||||
<https://github.com/studio-ousia/luke/tree/master/notebooks>`__.
|
||||
@ -137,6 +140,12 @@ LukeModel
|
||||
.. autoclass:: transformers.LukeModel
|
||||
:members: forward
|
||||
|
||||
LukeForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.LukeForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
LukeForEntityClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -58,7 +58,7 @@ examples. To install :obj:`sentencepiece` run ``pip install sentencepiece``.
|
||||
tokenizer = M2M100Tokenizer.from_pretrained('facebook/m2m100_418M', src_lang="en", tgt_lang="fr")
|
||||
|
||||
src_text = "Life is like a box of chocolates."
|
||||
tgt_lang = "La vie est comme une boîte de chocolat."
|
||||
tgt_text = "La vie est comme une boîte de chocolat."
|
||||
|
||||
model_inputs = tokenizer(src_text, return_tensors="pt")
|
||||
with tokenizer.as_target_tokenizer():
|
||||
|
||||
@ -103,8 +103,8 @@ Here is the code to see all available pretrained models on the hub:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from transformers.hf_api import HfApi
|
||||
model_list = HfApi().model_list()
|
||||
from huggingface_hub import list_models
|
||||
model_list = list_models()
|
||||
org = "Helsinki-NLP"
|
||||
model_ids = [x.modelId for x in model_list if x.modelId.startswith(org)]
|
||||
suffix = [x.split('/')[1] for x in model_ids]
|
||||
|
||||
@ -49,11 +49,11 @@ inside the context manager :meth:`~transformers.MBartTokenizer.as_target_tokeniz
|
||||
|
||||
>>> from transformers import MBartForConditionalGeneration, MBartTokenizer
|
||||
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
|
||||
>>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
>>> example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
|
||||
>>> expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
|
||||
|
||||
>>> inputs = tokenizer(example_english_phrase, return_tensors="pt", src_lang="en_XX", tgt_lang="ro_RO")
|
||||
>>> inputs = tokenizer(example_english_phrase, return_tensors="pt")
|
||||
>>> with tokenizer.as_target_tokenizer():
|
||||
... labels = tokenizer(expected_translation_romanian, return_tensors="pt")
|
||||
|
||||
|
||||
66
docs/source/model_doc/mluke.rst
Normal file
66
docs/source/model_doc/mluke.rst
Normal file
@ -0,0 +1,66 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
mLUKE
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The mLUKE model was proposed in `mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models
|
||||
<https://arxiv.org/abs/2110.08151>`__ by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension
|
||||
of the `LUKE model <https://arxiv.org/abs/2010.01057>`__ trained on the basis of XLM-RoBERTa.
|
||||
|
||||
It is based on XLM-RoBERTa and adds entity embeddings, which helps improve performance on various downstream tasks
|
||||
involving reasoning about entities such as named entity recognition, extractive question answering, relation
|
||||
classification, cloze-style knowledge completion.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent studies have shown that multilingual pretrained language models can be effectively improved with cross-lingual
|
||||
alignment information from Wikipedia entities. However, existing methods only exploit entity information in pretraining
|
||||
and do not explicitly use entities in downstream tasks. In this study, we explore the effectiveness of leveraging
|
||||
entity representations for downstream cross-lingual tasks. We train a multilingual language model with 24 languages
|
||||
with entity representations and show the model consistently outperforms word-based pretrained models in various
|
||||
cross-lingual transfer tasks. We also analyze the model and the key insight is that incorporating entity
|
||||
representations into the input allows us to extract more language-agnostic features. We also evaluate the model with a
|
||||
multilingual cloze prompt task with the mLAMA dataset. We show that entity-based prompt elicits correct factual
|
||||
knowledge more likely than using only word representations.*
|
||||
|
||||
One can directly plug in the weights of mLUKE into a LUKE model, like so:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import LukeModel
|
||||
|
||||
model = LukeModel.from_pretrained('studio-ousia/mluke-base')
|
||||
|
||||
Note that mLUKE has its own tokenizer, :class:`~transformers.MLukeTokenizer`. You can initialize it as follows:
|
||||
|
||||
.. code-block::
|
||||
|
||||
from transformers import MLukeTokenizer
|
||||
|
||||
tokenizer = MLukeTokenizer.from_pretrained('studio-ousia/mluke-base')
|
||||
|
||||
|
||||
As mLUKE's architecture is equivalent to that of LUKE, one can refer to :doc:`LUKE's documentation page <luke>` for all
|
||||
tips, code examples and notebooks.
|
||||
|
||||
This model was contributed by `ryo0634 <https://huggingface.co/ryo0634>`__. The original code can be found `here
|
||||
<https://github.com/studio-ousia/luke>`__.
|
||||
|
||||
MLukeTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.MLukeTokenizer
|
||||
:members: __call__, save_vocabulary
|
||||
@ -10,7 +10,7 @@
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
MT5
|
||||
mT5
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
@ -24,9 +24,28 @@ The abstract from the paper is the following:
|
||||
|
||||
*The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain
|
||||
state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a
|
||||
multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe
|
||||
multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail
|
||||
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
|
||||
benchmarks. All of the code and model checkpoints*
|
||||
benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a
|
||||
generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model
|
||||
checkpoints used in this work are publicly available.*
|
||||
|
||||
Note: mT5 was only pre-trained on `mC4 <https://huggingface.co/datasets/mc4>`__ excluding any supervised training.
|
||||
Therefore, this model has to be fine-tuned before it is useable on a downstream task, unlike the original T5 model.
|
||||
Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
|
||||
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
|
||||
|
||||
Google has released the following variants:
|
||||
|
||||
- `google/mt5-small <https://huggingface.co/google/mt5-small>`__
|
||||
|
||||
- `google/mt5-base <https://huggingface.co/google/mt5-base>`__
|
||||
|
||||
- `google/mt5-large <https://huggingface.co/google/mt5-large>`__
|
||||
|
||||
- `google/mt5-xl <https://huggingface.co/google/mt5-xl>`__
|
||||
|
||||
- `google/mt5-xxl <https://huggingface.co/google/mt5-xxl>`__.
|
||||
|
||||
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||
found `here <https://github.com/google-research/multilingual-t5>`__.
|
||||
@ -94,3 +113,17 @@ TFMT5EncoderModel
|
||||
|
||||
.. autoclass:: transformers.TFMT5EncoderModel
|
||||
:members:
|
||||
|
||||
|
||||
FlaxMT5Model
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMT5Model
|
||||
:members:
|
||||
|
||||
|
||||
FlaxMT5ForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxMT5ForConditionalGeneration
|
||||
:members:
|
||||
|
||||
@ -152,3 +152,17 @@ TFPegasusForConditionalGeneration
|
||||
|
||||
.. autoclass:: transformers.TFPegasusForConditionalGeneration
|
||||
:members: call
|
||||
|
||||
|
||||
FlaxPegasusModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxPegasusModel
|
||||
:members: __call__, encode, decode
|
||||
|
||||
|
||||
FlaxPegasusForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.FlaxPegasusForConditionalGeneration
|
||||
:members: __call__, encode, decode
|
||||
|
||||
211
docs/source/model_doc/perceiver.mdx
Normal file
211
docs/source/model_doc/perceiver.mdx
Normal file
@ -0,0 +1,211 @@
|
||||
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Perceiver
|
||||
|
||||
## Overview
|
||||
|
||||
The Perceiver IO model was proposed in [Perceiver IO: A General Architecture for Structured Inputs &
|
||||
Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch,
|
||||
Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M.
|
||||
Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
|
||||
|
||||
Perceiver IO is a generalization of [Perceiver](https://arxiv.org/abs/2103.03206) to handle arbitrary outputs in
|
||||
addition to arbitrary inputs. The original Perceiver only produced a single classification label. In addition to
|
||||
classification labels, Perceiver IO can produce (for example) language, optical flow, and multimodal videos with audio.
|
||||
This is done using the same building blocks as the original Perceiver. The computational complexity of Perceiver IO is
|
||||
linear in the input and output size and the bulk of the processing occurs in the latent space, allowing us to process
|
||||
inputs and outputs that are much larger than can be handled by standard Transformers. This means, for example,
|
||||
Perceiver IO can do BERT-style masked language modeling directly using bytes instead of tokenized inputs.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*The recently-proposed Perceiver model obtains good results on several domains (images, audio, multimodal, point
|
||||
clouds) while scaling linearly in compute and memory with the input size. While the Perceiver supports many kinds of
|
||||
inputs, it can only produce very simple outputs such as class scores. Perceiver IO overcomes this limitation without
|
||||
sacrificing the original's appealing properties by learning to flexibly query the model's latent space to produce
|
||||
outputs of arbitrary size and semantics. Perceiver IO still decouples model depth from data size and still scales
|
||||
linearly with data size, but now with respect to both input and output sizes. The full Perceiver IO model achieves
|
||||
strong results on tasks with highly structured output spaces, such as natural language and visual understanding,
|
||||
StarCraft II, and multi-task and multi-modal domains. As highlights, Perceiver IO matches a Transformer-based BERT
|
||||
baseline on the GLUE language benchmark without the need for input tokenization and achieves state-of-the-art
|
||||
performance on Sintel optical flow estimation.*
|
||||
|
||||
Here's a TLDR explaining how Perceiver works:
|
||||
|
||||
The main problem with the self-attention mechanism of the Transformer is that the time and memory requirements scale
|
||||
quadratically with the sequence length. Hence, models like BERT and RoBERTa are limited to a max sequence length of 512
|
||||
tokens. Perceiver aims to solve this issue by, instead of performing self-attention on the inputs, perform it on a set
|
||||
of latent variables, and only use the inputs for cross-attention. In this way, the time and memory requirements don't
|
||||
depend on the length of the inputs anymore, as one uses a fixed amount of latent variables, like 256 or 512. These are
|
||||
randomly initialized, after which they are trained end-to-end using backpropagation.
|
||||
|
||||
Internally, [`PerceiverModel`] will create the latents, which is a tensor of shape `(batch_size, num_latents,
|
||||
d_latents)`. One must provide `inputs` (which could be text, images, audio, you name it!) to the model, which it will
|
||||
use to perform cross-attention with the latents. The output of the Perceiver encoder is a tensor of the same shape. One
|
||||
can then, similar to BERT, convert the last hidden states of the latents to classification logits by averaging along
|
||||
the sequence dimension, and placing a linear layer on top of that to project the `d_latents` to `num_labels`.
|
||||
|
||||
This was the idea of the original Perceiver paper. However, it could only output classification logits. In a follow-up
|
||||
work, PerceiverIO, they generalized it to let the model also produce outputs of arbitrary size. How, you might ask? The
|
||||
idea is actually relatively simple: one defines outputs of an arbitrary size, and then applies cross-attention with the
|
||||
last hidden states of the latents, using the outputs as queries, and the latents as keys and values.
|
||||
|
||||
So let's say one wants to perform masked language modeling (BERT-style) with the Perceiver. As the Perceiver's input
|
||||
length will not have an impact on the computation time of the self-attention layers, one can provide raw bytes,
|
||||
providing `inputs` of length 2048 to the model. If one now masks out certain of these 2048 tokens, one can define the
|
||||
`outputs` as being of shape: `(batch_size, 2048, 768)`. Next, one performs cross-attention with the final hidden states
|
||||
of the latents to update the `outputs` tensor. After cross-attention, one still has a tensor of shape `(batch_size,
|
||||
2048, 768)`. One can then place a regular language modeling head on top, to project the last dimension to the
|
||||
vocabulary size of the model, i.e. creating logits of shape `(batch_size, 2048, 262)` (as Perceiver uses a vocabulary
|
||||
size of 262 byte IDs).
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/perceiver_architecture.jpg"
|
||||
alt="drawing" width="600"/>
|
||||
|
||||
<small> Perceiver IO architecture. Taken from the [original paper](https://arxiv.org/abs/2105.15203) </small>
|
||||
|
||||
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found
|
||||
[here](https://github.com/deepmind/deepmind-research/tree/master/perceiver).
|
||||
|
||||
Tips:
|
||||
|
||||
- The quickest way to get started with the Perceiver is by checking the [tutorial
|
||||
notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Perceiver).
|
||||
- Refer to the [blog post](https://huggingface.co/blog/perceiver) if you want to fully understand how the model works and
|
||||
is implemented in the library. Note that the models available in the library only showcase some examples of what you can do
|
||||
with the Perceiver. There are many more use cases, including question answering, named-entity recognition, object detection,
|
||||
audio classification, video classification, etc.
|
||||
|
||||
## Perceiver specific outputs
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverModelOutput
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverDecoderOutput
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverMaskedLMOutput
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverClassifierOutput
|
||||
|
||||
## PerceiverConfig
|
||||
|
||||
[[autodoc]] PerceiverConfig
|
||||
|
||||
## PerceiverTokenizer
|
||||
|
||||
[[autodoc]] PerceiverTokenizer
|
||||
- __call__
|
||||
|
||||
## PerceiverFeatureExtractor
|
||||
|
||||
[[autodoc]] PerceiverFeatureExtractor
|
||||
- __call__
|
||||
|
||||
## PerceiverTextPreprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverTextPreprocessor
|
||||
|
||||
## PerceiverImagePreprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverImagePreprocessor
|
||||
|
||||
## PerceiverOneHotPreprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverOneHotPreprocessor
|
||||
|
||||
## PerceiverAudioPreprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverAudioPreprocessor
|
||||
|
||||
## PerceiverMultimodalPreprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverMultimodalPreprocessor
|
||||
|
||||
## PerceiverProjectionDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverProjectionDecoder
|
||||
|
||||
## PerceiverBasicDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverBasicDecoder
|
||||
|
||||
## PerceiverClassificationDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverClassificationDecoder
|
||||
|
||||
## PerceiverOpticalFlowDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverOpticalFlowDecoder
|
||||
|
||||
## PerceiverBasicVideoAutoencodingDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverBasicVideoAutoencodingDecoder
|
||||
|
||||
## PerceiverMultimodalDecoder
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverMultimodalDecoder
|
||||
|
||||
## PerceiverProjectionPostprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverProjectionPostprocessor
|
||||
|
||||
## PerceiverAudioPostprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverAudioPostprocessor
|
||||
|
||||
## PerceiverClassificationPostprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverClassificationPostprocessor
|
||||
|
||||
## PerceiverMultimodalPostprocessor
|
||||
|
||||
[[autodoc]] models.perceiver.modeling_perceiver.PerceiverMultimodalPostprocessor
|
||||
|
||||
## PerceiverModel
|
||||
|
||||
[[autodoc]] PerceiverModel
|
||||
- forward
|
||||
|
||||
## PerceiverForMaskedLM
|
||||
|
||||
[[autodoc]] PerceiverForMaskedLM
|
||||
- forward
|
||||
|
||||
## PerceiverForSequenceClassification
|
||||
|
||||
[[autodoc]] PerceiverForSequenceClassification
|
||||
- forward
|
||||
|
||||
## PerceiverForImageClassificationLearned
|
||||
|
||||
[[autodoc]] PerceiverForImageClassificationLearned
|
||||
- forward
|
||||
|
||||
## PerceiverForImageClassificationFourier
|
||||
|
||||
[[autodoc]] PerceiverForImageClassificationFourier
|
||||
- forward
|
||||
|
||||
## PerceiverForImageClassificationConvProcessing
|
||||
|
||||
[[autodoc]] PerceiverForImageClassificationConvProcessing
|
||||
- forward
|
||||
|
||||
## PerceiverForOpticalFlow
|
||||
|
||||
[[autodoc]] PerceiverForOpticalFlow
|
||||
- forward
|
||||
|
||||
## PerceiverForMultimodalAutoencoding
|
||||
|
||||
[[autodoc]] PerceiverForMultimodalAutoencoding
|
||||
- forward
|
||||
@ -50,7 +50,8 @@ Example of use:
|
||||
>>> # phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
||||
|
||||
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
||||
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||
<https://github.com/VinAIResearch/PhoBERT>`__.
|
||||
|
||||
PhobertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
189
docs/source/model_doc/qdqbert.rst
Normal file
189
docs/source/model_doc/qdqbert.rst
Normal file
@ -0,0 +1,189 @@
|
||||
..
|
||||
Copyright 2021 NVIDIA Corporation and The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
QDQBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The QDQBERT model can be referenced in `Integer Quantization for Deep Learning Inference: Principles and Empirical
|
||||
Evaluation <https://arxiv.org/abs/2004.09602>`__ by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius
|
||||
Micikevicius.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by
|
||||
taking advantage of high throughput integer instructions. In this paper we review the mathematical aspects of
|
||||
quantization parameters and evaluate their choices on a wide range of neural network models for different application
|
||||
domains, including vision, speech, and language. We focus on quantization techniques that are amenable to acceleration
|
||||
by processors with high-throughput integer math pipelines. We also present a workflow for 8-bit quantization that is
|
||||
able to maintain accuracy within 1% of the floating-point baseline on all networks studied, including models that are
|
||||
more difficult to quantize, such as MobileNets and BERT-large.*
|
||||
|
||||
Tips:
|
||||
|
||||
- QDQBERT model adds fake quantization operations (pair of QuantizeLinear/DequantizeLinear ops) to (i) linear layer
|
||||
inputs and weights, (ii) matmul inputs, (iii) residual add inputs, in BERT model.
|
||||
|
||||
- QDQBERT requires the dependency of `Pytorch Quantization Toolkit
|
||||
<https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization>`__. To install ``pip install
|
||||
pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com``
|
||||
|
||||
- QDQBERT model can be loaded from any checkpoint of HuggingFace BERT model (for example *bert-base-uncased*), and
|
||||
perform Quantization Aware Training/Post Training Quantization.
|
||||
|
||||
- A complete example of using QDQBERT model to perform Quatization Aware Training and Post Training Quantization for
|
||||
SQUAD task can be found at `transformers/examples/research_projects/quantization-qdqbert/
|
||||
</examples/research_projects/quantization-qdqbert/>`_.
|
||||
|
||||
This model was contributed by `shangz <https://huggingface.co/shangz>`__.
|
||||
|
||||
|
||||
Set default quantizers
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
QDQBERT model adds fake quantization operations (pair of QuantizeLinear/DequantizeLinear ops) to BERT by
|
||||
:obj:`TensorQuantizer` in `Pytorch Quantization Toolkit
|
||||
<https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization>`__. :obj:`TensorQuantizer` is the module
|
||||
for quantizing tensors, with :obj:`QuantDescriptor` defining how the tensor should be quantized. Refer to `Pytorch
|
||||
Quantization Toolkit userguide
|
||||
<https://docs.nvidia.com/deeplearning/tensorrt/pytorch-quantization-toolkit/docs/userguide.html>`__ for more details.
|
||||
|
||||
Before creating QDQBERT model, one has to set the default :obj:`QuantDescriptor` defining default tensor quantizers.
|
||||
Example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> import pytorch_quantization.nn as quant_nn
|
||||
>>> from pytorch_quantization.tensor_quant import QuantDescriptor
|
||||
|
||||
>>> # The default tensor quantizer is set to use Max calibration method
|
||||
>>> input_desc = QuantDescriptor(num_bits=8, calib_method="max")
|
||||
>>> # The default tensor quantizer is set to be per-channel quantization for weights
|
||||
>>> weight_desc = QuantDescriptor(num_bits=8, axis=((0,)))
|
||||
>>> quant_nn.QuantLinear.set_default_quant_desc_input(input_desc)
|
||||
>>> quant_nn.QuantLinear.set_default_quant_desc_weight(weight_desc)
|
||||
|
||||
|
||||
Calibration
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
Calibration is the terminology of passing data samples to the quantizer and deciding the best scaling factors for
|
||||
tensors. After setting up the tensor quantizers, one can use the following example to calibrate the model:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> # Find the TensorQuantizer and enable calibration
|
||||
>>> for name, module in model.named_modules():
|
||||
>>> if name.endswith('_input_quantizer'):
|
||||
>>> module.enable_calib()
|
||||
>>> module.disable_quant() # Use full precision data to calibrate
|
||||
|
||||
>>> # Feeding data samples
|
||||
>>> model(x)
|
||||
>>> # ...
|
||||
|
||||
>>> # Finalize calibration
|
||||
>>> for name, module in model.named_modules():
|
||||
>>> if name.endswith('_input_quantizer'):
|
||||
>>> module.load_calib_amax()
|
||||
>>> module.enable_quant()
|
||||
|
||||
>>> # If running on GPU, it needs to call .cuda() again because new tensors will be created by calibration process
|
||||
>>> model.cuda()
|
||||
|
||||
>>> # Keep running the quantized model
|
||||
>>> # ...
|
||||
|
||||
|
||||
Export to ONNX
|
||||
_______________________________________________________________________________________________________________________
|
||||
|
||||
The goal of exporting to ONNX is to deploy inference by `TensorRT <https://developer.nvidia.com/tensorrt>`__. Fake
|
||||
quantization will be broken into a pair of QuantizeLinear/DequantizeLinear ONNX ops. After setting static member of
|
||||
TensorQuantizer to use Pytorch’s own fake quantization functions, fake quantized model can be exported to ONNX, follow
|
||||
the instructions in `torch.onnx <https://pytorch.org/docs/stable/onnx.html>`__. Example:
|
||||
|
||||
.. code-block::
|
||||
|
||||
>>> from pytorch_quantization.nn import TensorQuantizer
|
||||
>>> TensorQuantizer.use_fb_fake_quant = True
|
||||
|
||||
>>> # Load the calibrated model
|
||||
>>> ...
|
||||
>>> # ONNX export
|
||||
>>> torch.onnx.export(...)
|
||||
|
||||
|
||||
QDQBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertConfig
|
||||
:members:
|
||||
|
||||
|
||||
QDQBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertLMHeadModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertLMHeadModel
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForNextSentencePrediction
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForNextSentencePrediction
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
QDQBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.QDQBertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
161
docs/source/model_doc/rembert.rst
Normal file
161
docs/source/model_doc/rembert.rst
Normal file
@ -0,0 +1,161 @@
|
||||
..
|
||||
Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
RemBERT
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The RemBERT model was proposed in `Rethinking Embedding Coupling in Pre-trained Language Models
|
||||
<https://arxiv.org/abs/2010.12821>`__ by Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art
|
||||
pre-trained language models. We show that decoupled embeddings provide increased modeling flexibility, allowing us to
|
||||
significantly improve the efficiency of parameter allocation in the input embedding of multilingual models. By
|
||||
reallocating the input embedding parameters in the Transformer layers, we achieve dramatically better performance on
|
||||
standard natural language understanding tasks with the same number of parameters during fine-tuning. We also show that
|
||||
allocating additional capacity to the output embedding provides benefits to the model that persist through the
|
||||
fine-tuning stage even though the output embedding is discarded after pre-training. Our analysis shows that larger
|
||||
output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage
|
||||
Transformer representations to be more general and more transferable to other tasks and languages. Harnessing these
|
||||
findings, we are able to train models that achieve strong performance on the XTREME benchmark without increasing the
|
||||
number of parameters at the fine-tuning stage.*
|
||||
|
||||
Tips:
|
||||
|
||||
For fine-tuning, RemBERT can be thought of as a bigger version of mBERT with an ALBERT-like factorization of the
|
||||
embedding layer. The embeddings are not tied in pre-training, in contrast with BERT, which enables smaller input
|
||||
embeddings (preserved during fine-tuning) and bigger output embeddings (discarded at fine-tuning). The tokenizer is
|
||||
also similar to the Albert one rather than the BERT one.
|
||||
|
||||
RemBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertConfig
|
||||
:members:
|
||||
|
||||
|
||||
RemBertTokenizer
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
RemBertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertTokenizerFast
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
RemBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertModel
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForCausalLM
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForMaskedLM
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForSequenceClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForMultipleChoice
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForTokenClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
RemBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RemBertForQuestionAnswering
|
||||
:members: forward
|
||||
|
||||
|
||||
TFRemBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForMaskedLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForCausalLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForSequenceClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForMultipleChoice
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForMultipleChoice
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForTokenClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForTokenClassification
|
||||
:members: call
|
||||
|
||||
|
||||
TFRemBertForQuestionAnswering
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRemBertForQuestionAnswering
|
||||
:members: call
|
||||
@ -126,6 +126,13 @@ TFRobertaModel
|
||||
:members: call
|
||||
|
||||
|
||||
TFRobertaForCausalLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TFRobertaForCausalLM
|
||||
:members: call
|
||||
|
||||
|
||||
TFRobertaForMaskedLM
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
132
docs/source/model_doc/segformer.rst
Normal file
132
docs/source/model_doc/segformer.rst
Normal file
@ -0,0 +1,132 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
SegFormer
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The SegFormer model was proposed in `SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
|
||||
<https://arxiv.org/abs/2105.15203>`__ by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping
|
||||
Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great
|
||||
results on image segmentation benchmarks such as ADE20K and Cityscapes.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with
|
||||
lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel
|
||||
hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding,
|
||||
thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution
|
||||
differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from
|
||||
different layers, and thus combining both local attention and global attention to render powerful representations. We
|
||||
show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our
|
||||
approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance
|
||||
and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters,
|
||||
being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on
|
||||
Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C.*
|
||||
|
||||
The figure below illustrates the architecture of SegFormer. Taken from the `original paper
|
||||
<https://arxiv.org/abs/2105.15203>`__.
|
||||
|
||||
.. image:: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/segformer_architecture.png
|
||||
:width: 600
|
||||
|
||||
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||
<https://github.com/NVlabs/SegFormer>`__.
|
||||
|
||||
Tips:
|
||||
|
||||
- SegFormer consists of a hierarchical Transformer encoder, and a lightweight all-MLP decode head.
|
||||
:class:`~transformers.SegformerModel` is the hierarchical Transformer encoder (which in the paper is also referred to
|
||||
as Mix Transformer or MiT). :class:`~transformers.SegformerForSemanticSegmentation` adds the all-MLP decode head on
|
||||
top to perform semantic segmentation of images. In addition, there's
|
||||
:class:`~transformers.SegformerForImageClassification` which can be used to - you guessed it - classify images. The
|
||||
authors of SegFormer first pre-trained the Transformer encoder on ImageNet-1k to classify images. Next, they throw
|
||||
away the classification head, and replace it by the all-MLP decode head. Next, they fine-tune the model altogether on
|
||||
ADE20K, Cityscapes and COCO-stuff, which are important benchmarks for semantic segmentation. All checkpoints can be
|
||||
found on the `hub <https://huggingface.co/models?other=segformer>`__.
|
||||
- The quickest way to get started with SegFormer is by checking the `example notebooks
|
||||
<https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SegFormer>`__ (which showcase both inference and
|
||||
fine-tuning on custom data).
|
||||
- One can use :class:`~transformers.SegformerFeatureExtractor` to prepare images and corresponding segmentation maps
|
||||
for the model. Note that this feature extractor is fairly basic and does not include all data augmentations used in
|
||||
the original paper. The original preprocessing pipelines (for the ADE20k dataset for instance) can be found `here
|
||||
<https://github.com/NVlabs/SegFormer/blob/master/local_configs/_base_/datasets/ade20k_repeat.py>`__. The most
|
||||
important preprocessing step is that images and segmentation maps are randomly cropped and padded to the same size,
|
||||
such as 512x512 or 640x640, after which they are normalized.
|
||||
- One additional thing to keep in mind is that one can initialize :class:`~transformers.SegformerFeatureExtractor` with
|
||||
:obj:`reduce_labels` set to `True` or `False`. In some datasets (like ADE20k), the 0 index is used in the annotated
|
||||
segmentation maps for background. However, ADE20k doesn't include the "background" class in its 150 labels.
|
||||
Therefore, :obj:`reduce_labels` is used to reduce all labels by 1, and to make sure no loss is computed for the
|
||||
background class (i.e. it replaces 0 in the annotated maps by 255, which is the `ignore_index` of the loss function
|
||||
used by :class:`~transformers.SegformerForSemanticSegmentation`). However, other datasets use the 0 index as
|
||||
background class and include this class as part of all labels. In that case, :obj:`reduce_labels` should be set to
|
||||
`False`, as loss should also be computed for the background class.
|
||||
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below.
|
||||
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| **Model variant** | **Depths** | **Hidden sizes** | **Decoder hidden size** | **Params (M)** | **ImageNet-1k Top 1** |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b0 | [2, 2, 2, 2] | [32, 64, 160, 256] | 256 | 3.7 | 70.5 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b1 | [2, 2, 2, 2] | [64, 128, 320, 512] | 256 | 14.0 | 78.7 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b2 | [3, 4, 6, 3] | [64, 128, 320, 512] | 768 | 25.4 | 81.6 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b3 | [3, 4, 18, 3] | [64, 128, 320, 512] | 768 | 45.2 | 83.1 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
| MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 |
|
||||
+-------------------+---------------+---------------------+-------------------------+----------------+-----------------------+
|
||||
|
||||
SegformerConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerConfig
|
||||
:members:
|
||||
|
||||
|
||||
SegformerFeatureExtractor
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerFeatureExtractor
|
||||
:members: __call__
|
||||
|
||||
|
||||
SegformerModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerModel
|
||||
:members: forward
|
||||
|
||||
|
||||
SegformerDecodeHead
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerDecodeHead
|
||||
:members: forward
|
||||
|
||||
|
||||
SegformerForImageClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerForImageClassification
|
||||
:members: forward
|
||||
|
||||
|
||||
SegformerForSemanticSegmentation
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SegformerForSemanticSegmentation
|
||||
:members: forward
|
||||
67
docs/source/model_doc/sew.rst
Normal file
67
docs/source/model_doc/sew.rst
Normal file
@ -0,0 +1,67 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
SEW
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
SEW (Squeezed and Efficient Wav2Vec) was proposed in `Performance-Efficiency Trade-offs in Unsupervised Pre-training
|
||||
for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q.
|
||||
Weinberger, Yoav Artzi.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition
|
||||
(ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance
|
||||
and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a
|
||||
pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a
|
||||
variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x
|
||||
inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference
|
||||
time, SEW reduces word error rate by 25-50% across different model sizes.*
|
||||
|
||||
Tips:
|
||||
|
||||
- SEW is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
|
||||
- SEWForCTC is fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded using
|
||||
:class:`~transformers.Wav2Vec2CTCTokenizer`.
|
||||
|
||||
This model was contributed by `anton-l <https://huggingface.co/anton-l>`__.
|
||||
|
||||
|
||||
SEWConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWConfig
|
||||
:members:
|
||||
|
||||
|
||||
SEWModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWModel
|
||||
:members: forward
|
||||
|
||||
|
||||
SEWForCTC
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWForCTC
|
||||
:members: forward
|
||||
|
||||
|
||||
SEWForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWForSequenceClassification
|
||||
:members: forward
|
||||
66
docs/source/model_doc/sew_d.rst
Normal file
66
docs/source/model_doc/sew_d.rst
Normal file
@ -0,0 +1,66 @@
|
||||
..
|
||||
Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
SEW-D
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
SEW-D (Squeezed and Efficient Wav2Vec with Disentangled attention) was proposed in `Performance-Efficiency Trade-offs
|
||||
in Unsupervised Pre-training for Speech Recognition <https://arxiv.org/abs/2109.06870>`__ by Felix Wu, Kwangyoun Kim,
|
||||
Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition
|
||||
(ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance
|
||||
and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a
|
||||
pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a
|
||||
variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x
|
||||
inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference
|
||||
time, SEW reduces word error rate by 25-50% across different model sizes.*
|
||||
|
||||
Tips:
|
||||
|
||||
- SEW-D is a speech model that accepts a float array corresponding to the raw waveform of the speech signal.
|
||||
- SEWDForCTC is fine-tuned using connectionist temporal classification (CTC) so the model output has to be decoded
|
||||
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
|
||||
|
||||
This model was contributed by `anton-l <https://huggingface.co/anton-l>`__.
|
||||
|
||||
|
||||
SEWDConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWDConfig
|
||||
:members:
|
||||
|
||||
|
||||
SEWDModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWDModel
|
||||
:members: forward
|
||||
|
||||
|
||||
SEWDForCTC
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWDForCTC
|
||||
:members: forward
|
||||
|
||||
SEWDForSequenceClassification
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.SEWDForSequenceClassification
|
||||
:members: forward
|
||||
@ -42,8 +42,8 @@ features. The :class:`~transformers.Speech2TextProcessor` wraps :class:`~transfo
|
||||
predicted token ids.
|
||||
|
||||
The feature extractor depends on :obj:`torchaudio` and the tokenizer depends on :obj:`sentencepiece` so be sure to
|
||||
install those packages before running the examples. You could either install those as extra speech dependancies with
|
||||
``pip install transformers"[speech, sentencepiece]"`` or install the packages seperatly with ``pip install torchaudio
|
||||
install those packages before running the examples. You could either install those as extra speech dependencies with
|
||||
``pip install transformers"[speech, sentencepiece]"`` or install the packages seperately with ``pip install torchaudio
|
||||
sentencepiece``. Also ``torchaudio`` requires the development version of the `libsndfile
|
||||
<http://www.mega-nerd.com/libsndfile/>`__ package which can be installed via a system package manager. On Ubuntu it can
|
||||
be installed as follows: ``apt install libsndfile1-dev``
|
||||
@ -66,7 +66,7 @@ be installed as follows: ``apt install libsndfile1-dev``
|
||||
... batch["speech"] = speech
|
||||
... return batch
|
||||
|
||||
>>> ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
|
||||
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
||||
>>> ds = ds.map(map_to_array)
|
||||
|
||||
>>> inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
|
||||
@ -98,7 +98,7 @@ be installed as follows: ``apt install libsndfile1-dev``
|
||||
... batch["speech"] = speech
|
||||
... return batch
|
||||
|
||||
>>> ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
|
||||
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
||||
>>> ds = ds.map(map_to_array)
|
||||
|
||||
>>> inputs = processor(ds["speech"][0], sampling_rate=16_000, return_tensors="pt")
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user