Files
transformers/docs/source/en/model_doc/deberta.md
2025-10-15 14:08:54 -07:00

3.5 KiB

This model was released on 2020-06-05 and added to Hugging Face Transformers on 2020-11-16 and contributed by DeBERTa.

DeBERTa

DeBERTa improves upon BERT and RoBERTa through disentangled attention and an enhanced mask decoder. Disentangled attention uses separate vectors for content and position, computing attention weights with disentangled matrices. The enhanced mask decoder replaces the softmax layer for predicting masked tokens during pretraining. These techniques boost pretraining efficiency and downstream task performance, with DeBERTa outperforming RoBERTa-Large on MNLI, SQuAD v2.0, and RACE using half the training data.

import torch
from transformers import pipeline

pipeline = pipeline(task="fill-mask", model="microsoft/deberta-base", dtype="auto")
pipeline("Plants create [MASK] through a process known as photosynthesis.")
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained("microsoft/deberta-base", dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")

inputs = tokenizer("Plants create [MASK] through a process known as photosynthesis.", return_tensors="pt")
outputs = model(**inputs)
mask_position = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
predicted_word = tokenizer.decode(outputs.logits[0, mask_position].argmax(dim=-1))
print(f"Predicted word: {predicted_word}")

Usage tips

  • DeBERTa uses relative position embeddings. It doesn't require right-padding like BERT.
  • Use DeBERTa on sentence-level or sentence-pair classification tasks like MNLI, RTE, or SST-2 for best results.
  • For token-level tasks like masked language modeling, load a checkpoint specifically pretrained or fine-tuned for token-level tasks.

DebertaConfig

autodoc DebertaConfig

DebertaTokenizer

autodoc DebertaTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

DebertaTokenizerFast

autodoc DebertaTokenizerFast - build_inputs_with_special_tokens - create_token_type_ids_from_sequences

DebertaModel

autodoc DebertaModel - forward

DebertaPreTrainedModel

autodoc DebertaPreTrainedModel

DebertaForMaskedLM

autodoc DebertaForMaskedLM - forward

DebertaForSequenceClassification

autodoc DebertaForSequenceClassification - forward

DebertaForTokenClassification

autodoc DebertaForTokenClassification - forward

DebertaForQuestionAnswering

autodoc DebertaForQuestionAnswering - forward