mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-20 17:13:56 +08:00
2.6 KiB
2.6 KiB
This model was released on 2020-02-12 and added to Hugging Face Transformers on 2023-06-20 and contributed by patrickvonplaten.
T5v1.1
T5v1.1 builds on the T5 architecture with several key modifications. It replaces the standard ReLU activation in the feed-forward layers with GEGLU, which can improve learning efficiency. Dropout was disabled during pre-training on the C4 dataset, enhancing model quality, though it should be used during fine-tuning. The model does not share parameters between the embedding and classifier layers, and pre-training was done exclusively on C4 without mixing in downstream tasks.
import torch
from transformers import pipeline
pipeline = pipeline(task="text2text-generation", model="google/t5-v1_1-base", dtype="auto",)
pipeline("translate English to French: Plants create energy through a process known as photosynthesis.")
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google/t5-v1_1-base")
model = AutoModelForSeq2SeqLM.from_pretrained("google/t5-v1_1-base", dtype="auto",)
inputs = tokenizer("translate English to French: Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Usage tips
- T5 Version 1.1 was only pre-trained on C4 without supervised training. Fine-tune the model before using it on downstream tasks (unlike the original T5 model).
- Since T5v1.1 was pre-trained unsupervised, task prefixes don't help during single-task fine-tuning.
- Use task prefixes for multi-task fine-tuning.