7.1 KiB
Utilities for Generation
This page lists all the utility functions used by [~generation.GenerationMixin.generate],
[~generation.GenerationMixin.greedy_search],
[~generation.GenerationMixin.contrastive_search],
[~generation.GenerationMixin.sample],
[~generation.GenerationMixin.beam_search],
[~generation.GenerationMixin.beam_sample],
[~generation.GenerationMixin.group_beam_search], and
[~generation.GenerationMixin.constrained_beam_search].
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
The output of [~generation.GenerationMixin.generate] is an instance of a subclass of
[~utils.ModelOutput]. This output is a data structure containing all the information returned
by [~generation.GenerationMixin.generate], but that can also be used as tuple or dictionary.
Here's an example:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The generation_output object is a [~generation.GreedySearchDecoderOnlyOutput], as we can
see in the documentation of that class below, it means it has the following attributes:
sequences: the generated sequences of tokensscores(optional): the prediction scores of the language modelling head, for each generation stephidden_states(optional): the hidden states of the model, for each generation stepattentions(optional): the attention weights of the model, for each generation step
Here we have the scores since we passed along output_scores=True, but we don't have hidden_states and
attentions because we didn't pass output_hidden_states=True or output_attentions=True.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get None. Here for instance generation_output.scores are all the generated prediction scores of the
language modeling head, and generation_output.attentions is None.
When using our generation_output object as a tuple, it only keeps the attributes that don't have None values.
Here, for instance, it has two elements, loss then logits, so
generation_output[:2]
will return the tuple (generation_output.sequences, generation_output.scores) for instance.
When using our generation_output object as a dictionary, it only keeps the attributes that don't have None
values. Here, for instance, it has two keys that are sequences and scores.
We document here all output types.
GreedySearchOutput
autodoc generation.GreedySearchDecoderOnlyOutput
autodoc generation.GreedySearchEncoderDecoderOutput
autodoc generation.FlaxGreedySearchOutput
SampleOutput
autodoc generation.SampleDecoderOnlyOutput
autodoc generation.SampleEncoderDecoderOutput
autodoc generation.FlaxSampleOutput
BeamSearchOutput
autodoc generation.BeamSearchDecoderOnlyOutput
autodoc generation.BeamSearchEncoderDecoderOutput
BeamSampleOutput
autodoc generation.BeamSampleDecoderOnlyOutput
autodoc generation.BeamSampleEncoderDecoderOutput
LogitsProcessor
A [LogitsProcessor] can be used to modify the prediction scores of a language model head for
generation.
autodoc LogitsProcessor - call
autodoc LogitsProcessorList - call
autodoc LogitsWarper - call
autodoc MinLengthLogitsProcessor - call
autodoc MinNewTokensLengthLogitsProcessor - call
autodoc TemperatureLogitsWarper - call
autodoc RepetitionPenaltyLogitsProcessor - call
autodoc TopPLogitsWarper - call
autodoc TopKLogitsWarper - call
autodoc TypicalLogitsWarper - call
autodoc NoRepeatNGramLogitsProcessor - call
autodoc SequenceBiasLogitsProcessor - call
autodoc NoBadWordsLogitsProcessor - call
autodoc PrefixConstrainedLogitsProcessor - call
autodoc HammingDiversityLogitsProcessor - call
autodoc ForcedBOSTokenLogitsProcessor - call
autodoc ForcedEOSTokenLogitsProcessor - call
autodoc InfNanRemoveLogitsProcessor - call
autodoc TFLogitsProcessor - call
autodoc TFLogitsProcessorList - call
autodoc TFLogitsWarper - call
autodoc TFTemperatureLogitsWarper - call
autodoc TFTopPLogitsWarper - call
autodoc TFTopKLogitsWarper - call
autodoc TFMinLengthLogitsProcessor - call
autodoc TFNoBadWordsLogitsProcessor - call
autodoc TFNoRepeatNGramLogitsProcessor - call
autodoc TFRepetitionPenaltyLogitsProcessor - call
autodoc TFForcedBOSTokenLogitsProcessor - call
autodoc TFForcedEOSTokenLogitsProcessor - call
autodoc FlaxLogitsProcessor - call
autodoc FlaxLogitsProcessorList - call
autodoc FlaxLogitsWarper - call
autodoc FlaxTemperatureLogitsWarper - call
autodoc FlaxTopPLogitsWarper - call
autodoc FlaxTopKLogitsWarper - call
autodoc FlaxForcedBOSTokenLogitsProcessor - call
autodoc FlaxForcedEOSTokenLogitsProcessor - call
autodoc FlaxMinLengthLogitsProcessor - call
StoppingCriteria
A [StoppingCriteria] can be used to change when to stop generation (other than EOS token).
autodoc StoppingCriteria - call
autodoc StoppingCriteriaList - call
autodoc MaxLengthCriteria - call
autodoc MaxTimeCriteria - call
Constraints
A [Constraint] can be used to force the generation to include specific tokens or sequences in the output.
autodoc Constraint
autodoc PhrasalConstraint
autodoc DisjunctiveConstraint
autodoc ConstraintListState
BeamSearch
autodoc BeamScorer - process - finalize
autodoc BeamSearchScorer - process - finalize
autodoc ConstrainedBeamSearchScorer - process - finalize
Utilities
autodoc top_k_top_p_filtering
autodoc tf_top_k_top_p_filtering
Streamers
autodoc TextStreamer
autodoc TextIteratorStreamer