mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Files

stevhliu 0ecb993601 usage tips

2025-10-15 14:08:54 -07:00

3.1 KiB

Raw Blame History

This model was released on 2022-07-11 and added to Hugging Face Transformers on 2022-07-18 and contributed by lysandre.

NLLB

NLLB addresses the challenge of translating low-resource languages by developing a conditional compute model based on Sparsely Gated Mixture of Experts. This model uses novel data mining techniques to train on thousands of tasks, improving overfitting resistance. Evaluated on over 40,000 translation directions with the Flores-200 benchmark and a toxicity benchmark, NLLB achieves a 44% BLEU improvement over the previous state-of-the-art, advancing towards a universal translation system.

import torch
from transformers import pipeline

pipeline = pipeline(task="translation_en_to_fr", model="facebook/nllb-200-distilled-600M", src_lang="eng_Latn", tgt_lang="fra_Latn", dtype="auto")
pipeline("Plants create energy through a process known as photosynthesis.")

import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M")

inputs = tokenizer("Plants create energy through a process known as photosynthesis.", return_tensors="pt")
outputs = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("fra_Latn"))
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

Usage tips

The tokenizer was updated in April 2023. It now prefixes the source sequence with the source language instead of the target language. This prioritizes zero-shot performance at a minor cost to supervised performance.
For non-English languages, specify the language's BCP-47 code with the src_lang keyword.

NllbTokenizer

autodoc NllbTokenizer - build_inputs_with_special_tokens

NllbTokenizerFast

autodoc NllbTokenizerFast

3.1 KiB Raw Blame History

NLLB

Usage tips

NllbTokenizer

NllbTokenizerFast

3.1 KiB

Raw Blame History