Files
transformers/docs/source/en/model_doc/big_bird.md
2025-10-15 14:08:54 -07:00

3.9 KiB

This model was released on 2020-07-28 and added to Hugging Face Transformers on 2021-03-30 and contributed by vasudevgupta.

BigBird

BigBird: Transformers for Longer Sequences introduces a sparse-attention mechanism that reduces the quadratic dependency on sequence length to linear, enabling handling of much longer sequences compared to models like BERT. BigBird combines sparse, global, and random attention to approximate full attention efficiently. This allows it to process sequences up to 8 times longer on similar hardware, improving performance on long document NLP tasks such as question answering and summarization. Additionally, the model supports novel applications in genomics.

import torch
from transformers import pipeline

pipeline = pipeline(task="fill-mask", model="google/bigbird-roberta-base", dtype="auto")
pipeline("Plants create [MASK] through a process known as photosynthesis.")
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained("google/bigbird-roberta-base", dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("google/bigbird-roberta-base")

inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt")
outputs = model(**inputs)
mask_token_id = tokenizer.mask_token_id
mask_position = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
predicted_word = tokenizer.decode(outputs.logits[0, mask_position].argmax(dim=-1))
print(f"Predicted word: {predicted_word}")

Usage tips

  • Pad inputs on the right. BigBird uses absolute position embeddings.
  • BigBird supports original_full and block_sparse attention. Use original_full for sequences under 1024 tokens since sparse patterns don't help much with smaller inputs.
  • Current implementation uses 3-block window size and 2 global blocks. It only supports ITC-implementation and doesn't support num_random_blocks=0.
  • Sequence length must be divisible by the block size.

BigBirdConfig

autodoc BigBirdConfig

BigBirdTokenizer

autodoc BigBirdTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

BigBirdTokenizerFast

autodoc BigBirdTokenizerFast

BigBird specific outputs

autodoc models.big_bird.modeling_big_bird.BigBirdForPreTrainingOutput

BigBirdModel

autodoc BigBirdModel - forward

BigBirdForPreTraining

autodoc BigBirdForPreTraining - forward

BigBirdForCausalLM

autodoc BigBirdForCausalLM - forward

BigBirdForMaskedLM

autodoc BigBirdForMaskedLM - forward

BigBirdForSequenceClassification

autodoc BigBirdForSequenceClassification - forward

BigBirdForMultipleChoice

autodoc BigBirdForMultipleChoice - forward

BigBirdForTokenClassification

autodoc BigBirdForTokenClassification - forward

BigBirdForQuestionAnswering

autodoc BigBirdForQuestionAnswering - forward