6.3 KiB
This model was released on 2020-04-10 and added to Hugging Face Transformers on 2021-01-05 and contributed by patrickvonplaten.
LED
LED introduces an attention mechanism that scales linearly with sequence length, enabling the processing of very long documents. This mechanism combines local windowed attention with task-specific global attention, making it a drop-in replacement for standard self-attention. The model achieves state-of-the-art results in character-level language modeling on text8 and enwik8. Pretrained Longformer outperforms RoBERTa on long document tasks, setting new benchmarks on WikiHop and TriviaQA. Additionally, Longformer-Encoder-Decoder (LED) is introduced for generative sequence-to-sequence tasks, demonstrating effectiveness on the arXiv summarization dataset.
import torch
from transformers import pipeline
pipeline = pipeline(task="summarization", model="allenai/led-base-16384", dtype="auto")
pipeline("Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet. Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems. These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.")
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("allenai/led-base-16384", dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("allenai/led-base-16384")
text="""
Plants are among the most remarkable and essential life forms on Earth, possessing a unique ability to produce their own food through a process known as photosynthesis. This complex biochemical process is fundamental not only to plant life but to virtually all life on the planet.
Through photosynthesis, plants capture energy from sunlight using a green pigment called chlorophyll, which is located in specialized cell structures called chloroplasts. In the presence of light, plants absorb carbon dioxide from the atmosphere through small pores in their leaves called stomata, and take in water from the soil through their root systems.
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
"""
global_attention_mask = torch.zeros_like(input_ids.input_ids)
global_attention_mask[:, 0] = 1
output = model.generate(**input_ids, global_attention_mask=global_attention_mask)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Usage tips
- [
LEDForConditionalGeneration
] extends [BartForConditionalGeneration
] by replacing the traditional self-attention layer with Longformer's chunked self-attention layer. [LEDTokenizer
] is an alias of [BartTokenizer
]. - LED pads
input_ids
to be a multiple ofconfig.attention_window
when required. Use [LEDTokenizer
] with thepad_to_multiple_of
argument for a small speedup. - LED works best on long-range sequence-to-sequence tasks where
input_ids
are significantly longer than 1024 tokens. - LED uses global attention through the
global_attention_mask
(see [LongformerModel
]). For summarization, put global attention only on the first<s>
token. For question answering, put global attention on all question tokens. - Fine-tune LED on all 16384 parameters by enabling gradient checkpointing to avoid out-of-memory errors. Add
model.gradient_checkpointing_enable()
and setuse_cache=False
to disable caching and save memory. - Pad inputs on the right. LED uses absolute position embeddings.
LEDConfig
autodoc LEDConfig
LEDTokenizer
autodoc LEDTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary
LEDTokenizerFast
autodoc LEDTokenizerFast
LED specific outputs
autodoc models.led.modeling_led.LEDEncoderBaseModelOutput
autodoc models.led.modeling_led.LEDSeq2SeqModelOutput
autodoc models.led.modeling_led.LEDSeq2SeqLMOutput
autodoc models.led.modeling_led.LEDSeq2SeqSequenceClassifierOutput
autodoc models.led.modeling_led.LEDSeq2SeqQuestionAnsweringModelOutput
LEDModel
autodoc LEDModel - forward
LEDForConditionalGeneration
autodoc LEDForConditionalGeneration - forward
LEDForSequenceClassification
autodoc LEDForSequenceClassification - forward
LEDForQuestionAnswering
autodoc LEDForQuestionAnswering - forward