First draft

2025-10-21 01:23:56 +08:00 · 2023-05-12 16:25:29 -04:00
1 changed files with 132 additions and 0 deletions
--- a/docs/source/en/tools_inference_endpoints.mdx
+++ b/docs/source/en/tools_inference_endpoints.mdx
@ -0,0 +1,132 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Running tools on inference endpoints
+
+<Tip>
+
+This document is about running tools on inference endpoints so that agents may use these tools remotely. 
+If you do not know what tools and agents are in the context of transformers, we recommend you read the
+[Transformers Agents](transformers_agents) page first.
+
+</Tip>
+
+Agents are designed to use tools in order to respond to a natural language query. They are setup so as to load tools
+locally and use them directly in the runtime they're at.
+
+However, some of these tools can be heavy; tools that handle images, long text, or audio signals may need a 
+significant amount of memory in order to perform inference. Tools that generate images through a diffusion
+process, may require significant compute in order to perform the multiple steps they need; but end users
+may not benefit from the powerful setups required to use them.
+
+This is why we have support for **remote** tools: these have an API that can be called from the runtime, offloading
+the processing to the remote API. In this guide we'll explore how to set up an inference endpoint for a given tool
+to leverage it with the agents.
+
+Inference endpoints are one solution to handle remote tools; but they're not the only one. We integrate with
+[`gradio_tools`](custom_tools#leveraging-gradiotools) that also offers remote tools, and we'll continue adding 
+guides to other alternatives for remote tools.
+
+## Inference Endpoints
+
+
+[Inference Endpoints](https://huggingface.co/inference-endpoints) is a paid Hugging Face solution to easily deploy
+Transformers and Diffusers models on a fully-managed infrastructure. It has default deployment options for
+transformers and diffusers, but given that we're using a specific type of object here, tools, we'll set up a custom
+handler to get it to work.
+
+<Tip warning={true}>
+
+Inference Endpoints are a paid hosting service by Hugging Face, which needs to have an organization setup with 
+billing enabled.
+
+</Tip>
+
+Tools are Spaces by default in Transformers. When calling `push_to_hub` on a tool, you're effectively pushing
+the code to a Space on the Hugging Face Hub under a namespace that you own. There are many tools living on the
+[`huggingface-tools` namespace](https://huggingface.co/huggingface-tools); having them be Spaces by default means
+that users can play around with the tool directly in the browser.
+
+However, Inference Endpoints only work with **model** repositories. We'll therefore have to create a model
+repository to act as a proxy for the Space. That model repository will contain the `handler.py` file to serve
+our tool through an inference endpoint.
+
+For demonstration purposes, we'll consider that you already have a tool handy that you'd like to use remotely. If
+you'd like to setup your custom tool, we recommend reading the [Custom Tool](custom_tools#leveraging-gradiotools) 
+guide.
+
+We'll try and deploy the `huggingface-tools/text-to-video` tool to an inference endpoint. We have it available as 
+a gradio Space [here](https://huggingface.co/huggingface-tools/text-to-video).
+
+### Setting up the repository
+
+We'll start by creating a model repository that will serve as a serving point for this tool.
+It can be public or private; for the sake of this tutorial we'll keep this one public, but having it set to
+private doesn't interfere with the inference endpoint setup.
+
+The repository is created and is available [here](https://huggingface.co/huggingface-tools/text-to-video).
+In it, you'll see there is a custom handler file, called 
+[`handler.py`](https://huggingface.co/huggingface-tools/text-to-video/blob/main/handler.py), as well as a traditional
+requirements file called 
+[`requirements.txt`](https://huggingface.co/huggingface-tools/text-to-video/blob/main/requirements.txt).
+
+#### Handler file
+
+The handler file exposes an `EndpointHandler`, which serves as the link between the requests you'll be doing to the
+remote tool and the tool itself. It should:
+
+- Instantiate the tool in its initialization method
+- Have a `__call__` method which will take the serialized input and return the computed result.
+
+For text-to-text tools, the handler file is very simple; it looks like the following:
+
+```python
+from transformers.tools import load_tool
+
+
+class EndpointHandler:
+    def __init__(self, path=""):
+        self.tool = load_tool("huggingface-tools/text-to-video")
+        self.tool.setup()
+
+    def __call__(self, data):
+        inputs = data.pop("inputs", data)
+        return self.tool(**inputs)
+```
+
+However, it is different if handling different data types as it will need to serialize this data type. 
+This guide will be completed to include different serialization for text, image, audio and video.
+
+#### Requirement file
+
+The requirement file needs to specify all requirements necessary to run the tool. The basic dependencies are the 
+following:
+
+```text
+transformers>=4.29.0
+accelerate
+```
+
+but you may need to include any and all other dependencies needed by your tool
+
+### Spinning up an endpoint
+
+Once we're done creating the repository, we can go ahead and create our first endpoint. Head over to
+[the Inference Endpoints UI](https://ui.endpoints.huggingface.co/endpoints) and create your first endpoint.
+
+If the repository is setup correctly, it should spin up directly without issue.
+
+In case you encounter a "Failed" deployment, we recommend checking out 
+[this guide](https://huggingface.co/docs/inference-endpoints/guides/logs) on checking out the logs of an inference
+endpoint.
+
+TODO add images