Files
transformers/docs/source/en/model_doc/t5v1.1.md
2025-10-15 14:08:54 -07:00

2.6 KiB

This model was released on 2020-02-12 and added to Hugging Face Transformers on 2023-06-20 and contributed by patrickvonplaten.

T5v1.1

T5v1.1 builds on the T5 architecture with several key modifications. It replaces the standard ReLU activation in the feed-forward layers with GEGLU, which can improve learning efficiency. Dropout was disabled during pre-training on the C4 dataset, enhancing model quality, though it should be used during fine-tuning. The model does not share parameters between the embedding and classifier layers, and pre-training was done exclusively on C4 without mixing in downstream tasks.

import torch
from transformers import pipeline

pipeline = pipeline(task="text2text-generation", model="google/t5-v1_1-base", dtype="auto",)
pipeline("translate English to French: Plants create energy through a process known as photosynthesis.")
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("google/t5-v1_1-base")
model = AutoModelForSeq2SeqLM.from_pretrained("google/t5-v1_1-base", dtype="auto",)

inputs = tokenizer("translate English to French: Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Usage tips

  • T5 Version 1.1 was only pre-trained on C4 without supervised training. Fine-tune the model before using it on downstream tasks (unlike the original T5 model).
  • Since T5v1.1 was pre-trained unsupervised, task prefixes don't help during single-task fine-tuning.
  • Use task prefixes for multi-task fine-tuning.