mirror of https://github.com/huggingface/transformers.git synced 2025-11-13 21:57:37 +08:00

Files

Matt 264cce9e0a Chat response parsing (#40894 )

* Initial commit

* Adding more tests, bugfixes, starting tool tests

* Add support for JSON parsers and some tool tests

* stash commit

* stash commit

* stash commit

* stash commit

* stash commit

* Fix cohere schema, fix a lot of the recursive parser code

* GPT-OSS passing too!

* Update tests

* make fixup

* Offset tracking partially done

* stash commit

* stash commit

* Assistant masking Just Works

* make fixup

* stash commit

* stash commit

* JMESPath approach

* stash commit before i rip this PR apart

* Remove broken offset code

* Remove broken offset code

* Update chat parsing code and add tests for Ernie + fix Cohere tests for new format

* Implement tokenizer method

* jmespath dependency handling

* Completed TODOs

* Add support to TextGenerationPipeline

* Update GPT-OSS schema and test cases

* make fixup

* Fix typing (??)

* missing future import

* Use old typing in tokenization_utils_base.py

* put jmespath in various extras

* Remove accidental newline

* Guard tests correctly

* Remove require_jinja on the schema tests since we don't actually apply chat templates there

* make fixup

* fix some bad linter changes

* Fix docstring

* Push draft documentation

* Extend tests, more documentation

* make fixup

* docs docs docs

* Add Processor support

* Add to toctree

* Flag markdown correctly

* Remove double backslashes in docs for simplicity

* Simplify node-regex-to-dict

* Add support to ImageTextToTextPipeline

* Add support to ImageTextToTextPipeline and save/loading support in Processors

* Begin reworking docs to start fitting in response parsing

* Fix rebase

* Expand documentation further

* Expand documentation further

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* More docs update

* Update TextGenerationPipeline to support tools properly

* Some rebase fixes

* Re-add is_jmespath_available

* Re-add is_jmespath_available

* Add Qwen3 parser and test, add maybe-json support

* Rollback processor changes - we'll wait for legacy saving to be deprecated

* Make fixup

* Revert ImageTextToText changes for now

* Add pipeline test

* make fixup

* Resolve a todo

* Resolve more TODOs and clean up the spec a little

* Add ref in the tools doc

* Update docs/source/en/chat_response_parsing.md

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/utils/chat_parsing_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add a docstring for parse_response

* Add function docstring and reference it in the docs

* Fix generate link

* Revert Processor changes for now

* Use updated GPT-OSS format

* Print the dict keys instead of the whole dict so the example doesn't become too big

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

2025-10-21 17:26:18 +01:00

9.5 KiB

Raw Blame History

Tool use

Chat models are commonly trained with support for "function-calling" or "tool-use". Tools are functions supplied by the user, which the model can choose to call as part of its response. For example, models could have access to a calculator tool to perform arithmetic without having to perform the computation internally.

This guide will demonstrate how to define tools, how to pass them to a chat model, and how to handle the model's output when it calls a tool.

Passing tools

When a model supports tool-use, pass functions to the tools argument of [~PreTrainedTokenizerBase.apply_chat_template]. The tools are passed as either a JSON schema or Python functions. If you pass Python functions, the arguments, argument types, and function docstring are parsed in order to generate the JSON schema automatically.

Although passing Python functions is very convenient, the parser can only handle Google-style docstrings. Refer to the examples below for how to format a tool-ready function.

def get_current_temperature(location: str, unit: str):
    """
    Get the current temperature at a location.

    Args:
        location: The location to get the temperature for, in the format "City, Country"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
    """
    return 22.  # A real function should probably actually get the temperature!

def get_current_wind_speed(location: str):
    """
    Get the current wind speed in km/h at a given location.

    Args:
        location: The location to get the wind speed for, in the format "City, Country"
    """
    return 6.  # A real function should probably actually get the wind speed!

tools = [get_current_temperature, get_current_wind_speed]

You can optionally add a Returns: block to the docstring and a return type to the function header, but most models won't use this information. The parser will also ignore the actual code inside the function!

What really matters is the function name, argument names, argument types, and docstring describing the function's purpose and the purpose of its arguments. These create the "signature" the model will use to decide whether to call the tool.

Tool-calling Example

Load a model and tokenizer that supports tool-use like NousResearch/Hermes-2-Pro-Llama-3-8B, but you can also consider a larger model like Command-R and Mixtral-8x22B if your hardware can support it.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, dtype="auto", device_map="auto")

Create a chat history.

messages = [
  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
]

Next, pass messages and a list of tools to [~PreTrainedTokenizerBase.apply_chat_template]. Tokenize the chat and generate a response.

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=128)
print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))

<tool_call>
{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
</tool_call><|im_end|>

The chat model called the get_current_temperature tool with the correct parameters from the docstring. It inferred France as the location based on Paris, and that it should use Celsius for the units of temperature.

A model cannot actually call the tool itself. It requests a tool call, and it's your job to handle the call and append it and the result to the chat history. For models that support response parsing, the response parsing will be handled automatically, and you can just use [`~PreTrainedTokenizer.parse_response] to extract the tool call. For other models, you'll need to manually translate the output string into a tool call dict.

Regardless of the approach you use, the tool call should go in the tool_calls key of an assistant message. This is the recommended API, and should be supported by the chat template of most tool-using models.

Warning

Although tool_calls is similar to the OpenAI API, the OpenAI API uses a JSON string as its tool_calls format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.

tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})

Append the tool response to the chat history with the tool role.

messages.append({"role": "tool", "content": "22"})  # Note that the returned content is always a string!

Finally, allow the model to read the tool response and reply to the user.

inputs = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
out = model.generate(**inputs.to(model.device), max_new_tokens=128)
print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))

The temperature in Paris, France right now is 22°C.<|im_end|>

Warning

Although the key in the assistant message is called tool_calls, in most cases, models only emit a single tool call at a time. Some older models emit multiple tool calls at the same time, but this is a significantly more complex process, as you need to handle multiple tool responses at once and disambiguate them, often using tool call IDs. Please refer to the model card to see exactly what format a model expects for tool calls.

JSON schemas

Another way to define tools is by passing a JSON schema.

You can also manually call the low-level functions that convert Python functions to JSON schemas, and then check or edit the generated schemas. This is usually not necessary, but is useful for understanding the underlying mechanics. It's particularly important for chat template authors who need to access the JSON schema to render the tool definitions.

The [~PreTrainedTokenizerBase.apply_chat_template] method uses the get_json_schema function to convert Python functions to a JSON schema.

from transformers.utils import get_json_schema

def multiply(a: float, b: float):
    """
    A function that multiplies two numbers

    Args:
        a: The first number to multiply
        b: The second number to multiply
    """
    return a * b

schema = get_json_schema(multiply)
print(schema)

{
  "type": "function",
  "function": {
    "name": "multiply",
    "description": "A function that multiplies two numbers",
    "parameters": {
      "type": "object",
      "properties": {
        "a": {
          "type": "number",
          "description": "The first number to multiply"
        },
        "b": {
          "type": "number",
          "description": "The second number to multiply"
        }
      },
      "required": ["a", "b"]
    }
  }
}

We won't go into the details of JSON schema itself here, since it's already very well documented elsewhere. We will, however, mention that you can pass JSON schema dicts to the tools argument of [~PreTrainedTokenizerBase.apply_chat_template] instead of Python functions:

# A simple function that takes no arguments
current_time = {
  "type": "function",
  "function": {
    "name": "current_time",
    "description": "Get the current local time as a string.",
    "parameters": {
      'type': 'object',
      'properties': {}
    }
  }
}

# A more complete function that takes two numerical arguments
multiply = {
  'type': 'function',
  'function': {
    'name': 'multiply',
    'description': 'A function that multiplies two numbers',
    'parameters': {
      'type': 'object',
      'properties': {
        'a': {
          'type': 'number',
          'description': 'The first number to multiply'
        },
        'b': {
          'type': 'number', 'description': 'The second number to multiply'
        }
      },
      'required': ['a', 'b']
    }
  }
}

model_input = tokenizer.apply_chat_template(
    messages,
    tools = [current_time, multiply]
)

9.5 KiB Raw Blame History

Tool use

Passing tools

Tool-calling Example

JSON schemas

9.5 KiB

Raw Blame History