mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-20 17:13:56 +08:00
Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add empty lines around code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
This commit is contained in:
@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
|
|||||||
|
|
||||||
* "How to train T5 on De->En translation?"
|
* "How to train T5 on De->En translation?"
|
||||||
|
|
||||||
|
|
||||||
## The GitHub Issues
|
## The GitHub Issues
|
||||||
|
|
||||||
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
|
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
|
||||||
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
|
|||||||
|
|
||||||
Try not use italics and bold text too much as these often make the text more difficult to read.
|
Try not use italics and bold text too much as these often make the text more difficult to read.
|
||||||
|
|
||||||
|
|
||||||
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
|
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
|
||||||
|
|
||||||
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
|
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
|
||||||
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
|
|||||||
1. https://github.com/huggingface/transformers/issues/9257
|
1. https://github.com/huggingface/transformers/issues/9257
|
||||||
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
|
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
|
||||||
|
|
||||||
|
|
||||||
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
|
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
|
||||||
|
|
||||||
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
|
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
|
||||||
|
@ -63,7 +63,6 @@ limitations under the License.
|
|||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
|
||||||
</h3>
|
</h3>
|
||||||
|
|
||||||
|
|
||||||
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
|
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
|
||||||
vision, audio, video, and multimodal model, for both inference and training.
|
vision, audio, video, and multimodal model, for both inference and training.
|
||||||
|
|
||||||
@ -194,7 +193,6 @@ pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.pn
|
|||||||
<details>
|
<details>
|
||||||
<summary>Visual question answering</summary>
|
<summary>Visual question answering</summary>
|
||||||
|
|
||||||
|
|
||||||
<h3 align="center">
|
<h3 align="center">
|
||||||
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
|
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
|
||||||
</h3>
|
</h3>
|
||||||
|
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
|
|||||||
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
|
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
|
||||||
|
|
||||||
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen
|
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen
|
||||||
|
|
||||||
|
@ -69,7 +69,6 @@ CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
|
|||||||
Only GPUs 0 and 2 are "visible" to PyTorch and are mapped to `cuda:0` and `cuda:1` respectively.
|
Only GPUs 0 and 2 are "visible" to PyTorch and are mapped to `cuda:0` and `cuda:1` respectively.
|
||||||
To reverse the order (use GPU 2 as `cuda:0` and GPU 0 as `cuda:1`):
|
To reverse the order (use GPU 2 as `cuda:0` and GPU 0 as `cuda:1`):
|
||||||
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
|
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
|
||||||
```
|
```
|
||||||
@ -108,7 +107,6 @@ To reverse the order (use XPU 2 as `xpu:0` and XPU 0 as `xpu:1`):
|
|||||||
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...
|
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
You can also control the order of Intel XPUs with:
|
You can also control the order of Intel XPUs with:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -120,7 +118,5 @@ For more information about device enumeration and sorting on Intel XPU, please r
|
|||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
> [!WARNING]
|
> [!WARNING]
|
||||||
> Environment variables can be exported instead of being added to the command line. This is not recommended because it can be confusing if you forget how the environment variable was set up and you end up using the wrong accelerators. Instead, it is common practice to set the environment variable for a specific training run on the same command line.
|
> Environment variables can be exported instead of being added to the command line. This is not recommended because it can be confusing if you forget how the environment variable was set up and you end up using the wrong accelerators. Instead, it is common practice to set the environment variable for a specific training run on the same command line.
|
||||||
|
@ -145,7 +145,6 @@ Arguments can also be passed directly to `@auto_docstring` for more control. Use
|
|||||||
|
|
||||||
The `Returns` and `Examples` parts of the docstring can also be manually specified.
|
The `Returns` and `Examples` parts of the docstring can also be manually specified.
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
MODEL_COMMON_CUSTOM_ARGS = r"""
|
MODEL_COMMON_CUSTOM_ARGS = r"""
|
||||||
common_arg_1 (`torch.Tensor`, *optional*, defaults to `default_value`):
|
common_arg_1 (`torch.Tensor`, *optional*, defaults to `default_value`):
|
||||||
@ -202,7 +201,6 @@ There are some rules for documenting different types of arguments and they're li
|
|||||||
|
|
||||||
If a standard argument behaves differently in your model, then you can override it locally in a `r""" """` block. This local definition has a higher priority. For example, the `labels` argument is often customized per model and typically requires overriding.
|
If a standard argument behaves differently in your model, then you can override it locally in a `r""" """` block. This local definition has a higher priority. For example, the `labels` argument is often customized per model and typically requires overriding.
|
||||||
|
|
||||||
|
|
||||||
- New or custom arguments should be documented within an `r""" """` block after the signature if it is a function or in the `__init__` method's docstring if it is a class.
|
- New or custom arguments should be documented within an `r""" """` block after the signature if it is a function or in the `__init__` method's docstring if it is a class.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
|
@ -62,8 +62,6 @@ Refer to the table below to compare how caching improves efficiency.
|
|||||||
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
|
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
|
||||||
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |
|
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Cache class
|
## Cache class
|
||||||
|
|
||||||
A basic KV cache interface takes a key and value tensor for the current token and returns the updated `K` and `V` tensors. This is internally managed by a model's `forward` method.
|
A basic KV cache interface takes a key and value tensor for the current token and returns the updated `K` and `V` tensors. This is internally managed by a model's `forward` method.
|
||||||
@ -143,7 +141,6 @@ Cache position is used internally for two purposes:
|
|||||||
|
|
||||||
The generation loop usually takes care of the cache position, but if you're writing a custom generation method, it is important that cache positions are accurate since they are used to write and read key/value states into fixed slots.
|
The generation loop usually takes care of the cache position, but if you're writing a custom generation method, it is important that cache positions are accurate since they are used to write and read key/value states into fixed slots.
|
||||||
|
|
||||||
|
|
||||||
```py
|
```py
|
||||||
import torch
|
import torch
|
||||||
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache, infer_device
|
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache, infer_device
|
||||||
@ -160,7 +157,6 @@ generated_ids = model.generate(**inputs, use_cache=True, max_new_tokens=10)
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Legacy cache format
|
## Legacy cache format
|
||||||
|
|
||||||
Before the [`Cache`] class, the cache used to be stored as a tuple of tuples of tensors. This format is dynamic because it grows as text is generated, similar to [`DynamicCache`].
|
Before the [`Cache`] class, the cache used to be stored as a tuple of tuples of tensors. This format is dynamic because it grows as text is generated, similar to [`DynamicCache`].
|
||||||
|
@ -29,7 +29,6 @@ the arguments, argument types, and function docstring are parsed in order to gen
|
|||||||
Although passing Python functions is very convenient, the parser can only handle [Google-style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings)
|
Although passing Python functions is very convenient, the parser can only handle [Google-style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings)
|
||||||
docstrings. Refer to the examples below for how to format a tool-ready function.
|
docstrings. Refer to the examples below for how to format a tool-ready function.
|
||||||
|
|
||||||
|
|
||||||
```py
|
```py
|
||||||
def get_current_temperature(location: str, unit: str):
|
def get_current_temperature(location: str, unit: str):
|
||||||
"""
|
"""
|
||||||
@ -103,7 +102,6 @@ Hold the call in the `tool_calls` key of an `assistant` message. This is the rec
|
|||||||
> [!WARNING]
|
> [!WARNING]
|
||||||
> Although `tool_calls` is similar to the OpenAI API, the OpenAI API uses a JSON string as its `tool_calls` format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.
|
> Although `tool_calls` is similar to the OpenAI API, the OpenAI API uses a JSON string as its `tool_calls` format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.
|
||||||
|
|
||||||
|
|
||||||
```py
|
```py
|
||||||
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
|
||||||
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
|
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
|
||||||
@ -131,7 +129,6 @@ The temperature in Paris, France right now is 22°C.<|im_end|>
|
|||||||
> Although the key in the assistant message is called `tool_calls`, in most cases, models only emit a single tool call at a time. Some older models emit multiple tool calls at the same time, but this is a
|
> Although the key in the assistant message is called `tool_calls`, in most cases, models only emit a single tool call at a time. Some older models emit multiple tool calls at the same time, but this is a
|
||||||
> significantly more complex process, as you need to handle multiple tool responses at once and disambiguate them, often using tool call IDs. Please refer to the model card to see exactly what format a model expects for tool calls.
|
> significantly more complex process, as you need to handle multiple tool responses at once and disambiguate them, often using tool call IDs. Please refer to the model card to see exactly what format a model expects for tool calls.
|
||||||
|
|
||||||
|
|
||||||
## JSON schemas
|
## JSON schemas
|
||||||
|
|
||||||
Another way to define tools is by passing a [JSON schema](https://json-schema.org/learn/getting-started-step-by-step).
|
Another way to define tools is by passing a [JSON schema](https://json-schema.org/learn/getting-started-step-by-step).
|
||||||
|
@ -43,6 +43,7 @@ chat = [
|
|||||||
|
|
||||||
tokenizer.apply_chat_template(chat, tokenize=False)
|
tokenizer.apply_chat_template(chat, tokenize=False)
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
|
<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
|
||||||
```
|
```
|
||||||
@ -62,6 +63,7 @@ chat = [
|
|||||||
|
|
||||||
tokenizer.apply_chat_template(chat, tokenize=False)
|
tokenizer.apply_chat_template(chat, tokenize=False)
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<|user|>\nHello, how are you?</s>\n<|assistant|>\nI'm doing great. How can I help you today?</s>\n<|user|>\nI'd like to show off how chat templating works!</s>\n
|
<|user|>\nHello, how are you?</s>\n<|assistant|>\nI'm doing great. How can I help you today?</s>\n<|user|>\nI'd like to show off how chat templating works!</s>\n
|
||||||
```
|
```
|
||||||
@ -110,6 +112,7 @@ Pass the tokenized chat to [`~GenerationMixin.generate`] to generate a response.
|
|||||||
outputs = model.generate(tokenized_chat, max_new_tokens=128)
|
outputs = model.generate(tokenized_chat, max_new_tokens=128)
|
||||||
print(tokenizer.decode(outputs[0]))
|
print(tokenizer.decode(outputs[0]))
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<|system|>
|
<|system|>
|
||||||
You are a friendly chatbot who always responds in the style of a pirate</s>
|
You are a friendly chatbot who always responds in the style of a pirate</s>
|
||||||
@ -135,6 +138,7 @@ Let's see an example to understand what `add_generation_prompt` is actually doin
|
|||||||
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
||||||
tokenized_chat
|
tokenized_chat
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<|im_start|>user
|
<|im_start|>user
|
||||||
Hi there!<|im_end|>
|
Hi there!<|im_end|>
|
||||||
@ -150,6 +154,7 @@ Now, let's format the same chat with `add_generation_prompt=True`:
|
|||||||
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||||
tokenized_chat
|
tokenized_chat
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<|im_start|>user
|
<|im_start|>user
|
||||||
Hi there!<|im_end|>
|
Hi there!<|im_end|>
|
||||||
@ -186,7 +191,6 @@ model.generate(**formatted_chat)
|
|||||||
|
|
||||||
[`TextGenerationPipeline`] sets [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) to `True` by default to start a new message. However, if the final message in the chat has the `assistant` role, it assumes the message is a prefill and switches to `continue_final_message=True`. This is because most models don’t support multiple consecutive assistant messages. To override this behavior, explicitly pass the [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) argument to the pipeline.
|
[`TextGenerationPipeline`] sets [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) to `True` by default to start a new message. However, if the final message in the chat has the `assistant` role, it assumes the message is a prefill and switches to `continue_final_message=True`. This is because most models don’t support multiple consecutive assistant messages. To override this behavior, explicitly pass the [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) argument to the pipeline.
|
||||||
|
|
||||||
|
|
||||||
## Model training
|
## Model training
|
||||||
|
|
||||||
Training a model with a chat template is a good way to ensure the template matches the tokens the model was trained on. Apply the chat template as a preprocessing step to your dataset. Set `add_generation_prompt=False` because the additional tokens to prompt an assistant response aren't helpful during training.
|
Training a model with a chat template is a good way to ensure the template matches the tokens the model was trained on. Apply the chat template as a preprocessing step to your dataset. Set `add_generation_prompt=False` because the additional tokens to prompt an assistant response aren't helpful during training.
|
||||||
@ -212,6 +216,7 @@ dataset = Dataset.from_dict({"chat": [chat1, chat2]})
|
|||||||
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
|
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
|
||||||
print(dataset['formatted_chat'][0])
|
print(dataset['formatted_chat'][0])
|
||||||
```
|
```
|
||||||
|
|
||||||
```md
|
```md
|
||||||
<|user|>
|
<|user|>
|
||||||
Which is bigger, the moon or the sun?</s>
|
Which is bigger, the moon or the sun?</s>
|
||||||
|
@ -18,7 +18,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
Multimodal chat models accept inputs like images, audio or video, in addition to text. The `content` key in a multimodal chat history is a list containing multiple items of different types. This is unlike text-only chat models whose `content` key is a single string.
|
Multimodal chat models accept inputs like images, audio or video, in addition to text. The `content` key in a multimodal chat history is a list containing multiple items of different types. This is unlike text-only chat models whose `content` key is a single string.
|
||||||
|
|
||||||
|
|
||||||
In the same way the [Tokenizer](./fast_tokenizer) class handles chat templates and tokenization for text-only models,
|
In the same way the [Tokenizer](./fast_tokenizer) class handles chat templates and tokenization for text-only models,
|
||||||
the [Processor](./processors) class handles preprocessing, tokenization and chat templates for multimodal models. Their [`~ProcessorMixin.apply_chat_template`] methods are almost identical.
|
the [Processor](./processors) class handles preprocessing, tokenization and chat templates for multimodal models. Their [`~ProcessorMixin.apply_chat_template`] methods are almost identical.
|
||||||
|
|
||||||
@ -57,7 +56,6 @@ out = pipe(text=messages, max_new_tokens=128)
|
|||||||
print(out[0]['generated_text'][-1]['content'])
|
print(out[0]['generated_text'][-1]['content'])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
Ahoy, me hearty! These be two feline friends, likely some tabby cats, taking a siesta on a cozy pink blanket. They're resting near remote controls, perhaps after watching some TV or just enjoying some quiet time together. Cats sure know how to find comfort and relaxation, don't they?
|
Ahoy, me hearty! These be two feline friends, likely some tabby cats, taking a siesta on a cozy pink blanket. They're resting near remote controls, perhaps after watching some TV or just enjoying some quiet time together. Cats sure know how to find comfort and relaxation, don't they?
|
||||||
```
|
```
|
||||||
@ -69,7 +67,6 @@ Aside from the gradual descent from pirate-speak into modern American English (i
|
|||||||
Like [text-only models](./chat_templating), use the [`~ProcessorMixin.apply_chat_template`] method to prepare the chat messages for multimodal models.
|
Like [text-only models](./chat_templating), use the [`~ProcessorMixin.apply_chat_template`] method to prepare the chat messages for multimodal models.
|
||||||
This method handles the tokenization and formatting of the chat messages, including images and other media types. The resulting inputs are passed to the model for generation.
|
This method handles the tokenization and formatting of the chat messages, including images and other media types. The resulting inputs are passed to the model for generation.
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoProcessor, AutoModelForImageTextToText
|
from transformers import AutoProcessor, AutoModelForImageTextToText
|
||||||
|
|
||||||
@ -99,7 +96,6 @@ processed_chat = processor.apply_chat_template(messages, add_generation_prompt=T
|
|||||||
print(list(processed_chat.keys()))
|
print(list(processed_chat.keys()))
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
['input_ids', 'attention_mask', 'pixel_values', 'image_grid_thw']
|
['input_ids', 'attention_mask', 'pixel_values', 'image_grid_thw']
|
||||||
```
|
```
|
||||||
@ -113,7 +109,6 @@ print(processor.decode(out[0]))
|
|||||||
|
|
||||||
The decoded output contains the full conversation so far, including the user message and the placeholder tokens that contain the image information. You may need to trim the previous conversation from the output before displaying it to the user.
|
The decoded output contains the full conversation so far, including the user message and the placeholder tokens that contain the image information. You may need to trim the previous conversation from the output before displaying it to the user.
|
||||||
|
|
||||||
|
|
||||||
## Video inputs
|
## Video inputs
|
||||||
|
|
||||||
Some vision models also support video inputs. The message format is very similar to the format for [image inputs](#image-inputs).
|
Some vision models also support video inputs. The message format is very similar to the format for [image inputs](#image-inputs).
|
||||||
@ -148,6 +143,7 @@ messages = [
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Example: Passing decoded video objects
|
### Example: Passing decoded video objects
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
@ -167,7 +163,9 @@ messages = [
|
|||||||
},
|
},
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also use existing (`"load_video()"`) function to load a video, edit the video in memory and pass it in the messages.
|
You can also use existing (`"load_video()"`) function to load a video, edit the video in memory and pass it in the messages.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
|
||||||
# Make sure a video backend library (pyav, decord, or torchvision) is available.
|
# Make sure a video backend library (pyav, decord, or torchvision) is available.
|
||||||
@ -200,7 +198,6 @@ Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input
|
|||||||
|
|
||||||
The `num_frames` parameter controls how many frames to uniformly sample from the video. Each checkpoint has a maximum frame count it was pretrained with and exceeding this count can significantly lower generation quality. It's important to choose a frame count that fits both the model capacity and your hardware resources. If `num_frames` isn't specified, the entire video is loaded without any frame sampling.
|
The `num_frames` parameter controls how many frames to uniformly sample from the video. Each checkpoint has a maximum frame count it was pretrained with and exceeding this count can significantly lower generation quality. It's important to choose a frame count that fits both the model capacity and your hardware resources. If `num_frames` isn't specified, the entire video is loaded without any frame sampling.
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
processed_chat = processor.apply_chat_template(
|
processed_chat = processor.apply_chat_template(
|
||||||
messages,
|
messages,
|
||||||
@ -265,4 +262,3 @@ print(processed_chat.keys())
|
|||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
@ -18,7 +18,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
A chat template is a [Jinja](https://jinja.palletsprojects.com/en/stable/templates/) template stored in the tokenizer's [chat_template](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.chat_template) attribute. Jinja is a templating language that allows you to write Python-like code and syntax.
|
A chat template is a [Jinja](https://jinja.palletsprojects.com/en/stable/templates/) template stored in the tokenizer's [chat_template](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.chat_template) attribute. Jinja is a templating language that allows you to write Python-like code and syntax.
|
||||||
|
|
||||||
|
|
||||||
```jinja
|
```jinja
|
||||||
{%- for message in messages %}
|
{%- for message in messages %}
|
||||||
{{- '<|' + message['role'] + |>\n' }}
|
{{- '<|' + message['role'] + |>\n' }}
|
||||||
@ -108,7 +107,6 @@ We strongly recommend using `-` to ensure only the intended content is printed.
|
|||||||
|
|
||||||
### Special variables and callables
|
### Special variables and callables
|
||||||
|
|
||||||
|
|
||||||
The only constants in a template are the `messages` variable and the `add_generation_prompt` boolean. However, you have
|
The only constants in a template are the `messages` variable and the `add_generation_prompt` boolean. However, you have
|
||||||
access to **any other keyword arguments that are passed** to the [`~PreTrainedTokenizerBase.apply_chat_template`] method.
|
access to **any other keyword arguments that are passed** to the [`~PreTrainedTokenizerBase.apply_chat_template`] method.
|
||||||
|
|
||||||
|
@ -48,7 +48,6 @@ transformers chat -h
|
|||||||
|
|
||||||
The chat is implemented on top of the [AutoClass](./model_doc/auto), using tooling from [text generation](./llm_tutorial) and [chat](./chat_templating). It uses the `transformers serve` CLI under the hood ([docs](./serving.md#serve-cli)).
|
The chat is implemented on top of the [AutoClass](./model_doc/auto), using tooling from [text generation](./llm_tutorial) and [chat](./chat_templating). It uses the `transformers serve` CLI under the hood ([docs](./serving.md#serve-cli)).
|
||||||
|
|
||||||
|
|
||||||
## TextGenerationPipeline
|
## TextGenerationPipeline
|
||||||
|
|
||||||
[`TextGenerationPipeline`] is a high-level text generation class with a "chat mode". Chat mode is enabled when a conversational model is detected and the chat prompt is [properly formatted](./llm_tutorial#wrong-prompt-format).
|
[`TextGenerationPipeline`] is a high-level text generation class with a "chat mode". Chat mode is enabled when a conversational model is detected and the chat prompt is [properly formatted](./llm_tutorial#wrong-prompt-format).
|
||||||
|
@ -38,5 +38,3 @@ You are now ready to use your local model in Cursor! For instance, if you toggle
|
|||||||
<h3 align="center">
|
<h3 align="center">
|
||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_cursor_chat.png"/>
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_cursor_chat.png"/>
|
||||||
</h3>
|
</h3>
|
||||||
|
|
||||||
|
|
||||||
|
@ -389,7 +389,6 @@ from .utils import some_function
|
|||||||
|
|
||||||
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom generation method.
|
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom generation method.
|
||||||
|
|
||||||
|
|
||||||
#### requirements.txt
|
#### requirements.txt
|
||||||
|
|
||||||
You can optionally specify additional Python requirements in a `requirements.txt` file inside the `custom_generate` folder. These are checked at runtime and an exception will be thrown if they're missing, nudging users to update their environment accordingly.
|
You can optionally specify additional Python requirements in a `requirements.txt` file inside the `custom_generate` folder. These are checked at runtime and an exception will be thrown if they're missing, nudging users to update their environment accordingly.
|
||||||
|
@ -19,7 +19,6 @@ rendered properly in your Markdown viewer.
|
|||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
|
||||||
</h3>
|
</h3>
|
||||||
|
|
||||||
|
|
||||||
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
|
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
|
||||||
vision, audio, video, and multimodal model, for both inference and training.
|
vision, audio, video, and multimodal model, for both inference and training.
|
||||||
|
|
||||||
|
@ -20,7 +20,6 @@ This page lists all of Transformers general utility functions that are found in
|
|||||||
|
|
||||||
Most of those are only useful if you are studying the general code in the library.
|
Most of those are only useful if you are studying the general code in the library.
|
||||||
|
|
||||||
|
|
||||||
## Enums and namedtuples
|
## Enums and namedtuples
|
||||||
|
|
||||||
[[autodoc]] utils.ExplicitEnum
|
[[autodoc]] utils.ExplicitEnum
|
||||||
|
@ -65,7 +65,6 @@ values. Here, for instance, it has two keys that are `sequences` and `scores`.
|
|||||||
|
|
||||||
We document here all output types.
|
We document here all output types.
|
||||||
|
|
||||||
|
|
||||||
[[autodoc]] generation.GenerateDecoderOnlyOutput
|
[[autodoc]] generation.GenerateDecoderOnlyOutput
|
||||||
|
|
||||||
[[autodoc]] generation.GenerateEncoderDecoderOutput
|
[[autodoc]] generation.GenerateEncoderDecoderOutput
|
||||||
@ -74,13 +73,11 @@ We document here all output types.
|
|||||||
|
|
||||||
[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
|
[[autodoc]] generation.GenerateBeamEncoderDecoderOutput
|
||||||
|
|
||||||
|
|
||||||
## LogitsProcessor
|
## LogitsProcessor
|
||||||
|
|
||||||
A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
|
A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
|
||||||
generation.
|
generation.
|
||||||
|
|
||||||
|
|
||||||
[[autodoc]] AlternatingCodebooksLogitsProcessor
|
[[autodoc]] AlternatingCodebooksLogitsProcessor
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
@ -174,8 +171,6 @@ generation.
|
|||||||
[[autodoc]] WatermarkLogitsProcessor
|
[[autodoc]] WatermarkLogitsProcessor
|
||||||
- __call__
|
- __call__
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## StoppingCriteria
|
## StoppingCriteria
|
||||||
|
|
||||||
A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
|
A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token). Please note that this is exclusively available to our PyTorch implementations.
|
||||||
@ -300,7 +295,6 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
|||||||
- to_legacy_cache
|
- to_legacy_cache
|
||||||
- from_legacy_cache
|
- from_legacy_cache
|
||||||
|
|
||||||
|
|
||||||
## Watermark Utils
|
## Watermark Utils
|
||||||
|
|
||||||
[[autodoc]] WatermarkingConfig
|
[[autodoc]] WatermarkingConfig
|
||||||
|
@ -21,10 +21,8 @@ provides for it.
|
|||||||
|
|
||||||
Most of those are only useful if you are adding new models in the library.
|
Most of those are only useful if you are adding new models in the library.
|
||||||
|
|
||||||
|
|
||||||
## Model addition debuggers
|
## Model addition debuggers
|
||||||
|
|
||||||
|
|
||||||
### Model addition debugger - context manager for model adders
|
### Model addition debugger - context manager for model adders
|
||||||
|
|
||||||
This context manager is a power user tool intended for model adders. It tracks all forward calls within a model forward
|
This context manager is a power user tool intended for model adders. It tracks all forward calls within a model forward
|
||||||
@ -72,7 +70,6 @@ with model_addition_debugger_context(
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### Reading results
|
### Reading results
|
||||||
|
|
||||||
The debugger generates two files from the forward call, both with the same base name, but ending either with
|
The debugger generates two files from the forward call, both with the same base name, but ending either with
|
||||||
@ -231,10 +228,8 @@ Once the forward passes of two models have been traced by the debugger, one can
|
|||||||
below: we can see slight differences between these two implementations' key projection layer. Inputs are mostly
|
below: we can see slight differences between these two implementations' key projection layer. Inputs are mostly
|
||||||
identical, but not quite. Looking through the file differences makes it easier to pinpoint which layer is wrong.
|
identical, but not quite. Looking through the file differences makes it easier to pinpoint which layer is wrong.
|
||||||
|
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
### Limitations and scope
|
### Limitations and scope
|
||||||
|
|
||||||
This feature will only work for torch-based models. Models relying heavily on external kernel calls may work, but trace will
|
This feature will only work for torch-based models. Models relying heavily on external kernel calls may work, but trace will
|
||||||
@ -268,7 +263,6 @@ This utility:
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
|
||||||
### Usage
|
### Usage
|
||||||
|
|
||||||
You can run the skipped test analyzer in two ways:
|
You can run the skipped test analyzer in two ways:
|
||||||
|
@ -20,7 +20,6 @@ This page lists all the utility functions the library provides for pipelines.
|
|||||||
|
|
||||||
Most of those are only useful if you are studying the code of the models in the library.
|
Most of those are only useful if you are studying the code of the models in the library.
|
||||||
|
|
||||||
|
|
||||||
## Argument handling
|
## Argument handling
|
||||||
|
|
||||||
[[autodoc]] pipelines.ArgumentHandler
|
[[autodoc]] pipelines.ArgumentHandler
|
||||||
|
@ -24,6 +24,7 @@ In Transformers, the [`~GenerationMixin.generate`] API handles text generation,
|
|||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> You can also chat with a model directly from the command line. ([reference](./conversations.md#transformers))
|
> You can also chat with a model directly from the command line. ([reference](./conversations.md#transformers))
|
||||||
|
>
|
||||||
> ```shell
|
> ```shell
|
||||||
> transformers chat Qwen/Qwen2.5-0.5B-Instruct
|
> transformers chat Qwen/Qwen2.5-0.5B-Instruct
|
||||||
> ```
|
> ```
|
||||||
@ -35,6 +36,7 @@ Before you begin, it's helpful to install [bitsandbytes](https://hf.co/docs/bits
|
|||||||
```bash
|
```bash
|
||||||
!pip install -U transformers bitsandbytes
|
!pip install -U transformers bitsandbytes
|
||||||
```
|
```
|
||||||
|
|
||||||
Bitsandbytes supports multiple backends in addition to CUDA-based GPUs. Refer to the multi-backend installation [guide](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend) to learn more.
|
Bitsandbytes supports multiple backends in addition to CUDA-based GPUs. Refer to the multi-backend installation [guide](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend) to learn more.
|
||||||
|
|
||||||
Load a LLM with [`~PreTrainedModel.from_pretrained`] and add the following two parameters to reduce the memory requirements.
|
Load a LLM with [`~PreTrainedModel.from_pretrained`] and add the following two parameters to reduce the memory requirements.
|
||||||
@ -154,7 +156,6 @@ print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
|
|||||||
| `repetition_penalty` | `float` | Set it to `>1.0` if you're seeing the model repeat itself often. Larger values apply a larger penalty. |
|
| `repetition_penalty` | `float` | Set it to `>1.0` if you're seeing the model repeat itself often. Larger values apply a larger penalty. |
|
||||||
| `eos_token_id` | `list[int]` | The token(s) that will cause generation to stop. The default value is usually good, but you can specify a different token. |
|
| `eos_token_id` | `list[int]` | The token(s) that will cause generation to stop. The default value is usually good, but you can specify a different token. |
|
||||||
|
|
||||||
|
|
||||||
## Pitfalls
|
## Pitfalls
|
||||||
|
|
||||||
The section below covers some common issues you may encounter during text generation and how to solve them.
|
The section below covers some common issues you may encounter during text generation and how to solve them.
|
||||||
|
@ -66,6 +66,7 @@ If you have access to an 8 x 80GB A100 node, you could load BLOOM as follows
|
|||||||
```bash
|
```bash
|
||||||
!pip install transformers accelerate bitsandbytes optimum
|
!pip install transformers accelerate bitsandbytes optimum
|
||||||
```
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoModelForCausalLM
|
from transformers import AutoModelForCausalLM
|
||||||
|
|
||||||
@ -98,6 +99,7 @@ result
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
Here is a Python function that transforms bytes to Giga bytes:\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single
|
Here is a Python function that transforms bytes to Giga bytes:\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single
|
||||||
```
|
```
|
||||||
@ -116,6 +118,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
29.0260648727417
|
29.0260648727417
|
||||||
```
|
```
|
||||||
@ -127,7 +130,6 @@ Note that if we had tried to run the model in full float32 precision, a whopping
|
|||||||
|
|
||||||
If you are unsure in which format the model weights are stored on the Hub, you can always look into the checkpoint's config under `"dtype"`, *e.g.* [here](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). It is recommended to set the model to the same precision type as written in the config when loading with `from_pretrained(..., dtype=...)` except when the original type is float32 in which case one can use both `float16` or `bfloat16` for inference.
|
If you are unsure in which format the model weights are stored on the Hub, you can always look into the checkpoint's config under `"dtype"`, *e.g.* [here](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). It is recommended to set the model to the same precision type as written in the config when loading with `from_pretrained(..., dtype=...)` except when the original type is float32 in which case one can use both `float16` or `bfloat16` for inference.
|
||||||
|
|
||||||
|
|
||||||
Let's define a `flush(...)` function to free all allocated memory so that we can accurately measure the peak allocated GPU memory.
|
Let's define a `flush(...)` function to free all allocated memory so that we can accurately measure the peak allocated GPU memory.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@ -148,6 +150,7 @@ Let's call it now for the next experiment.
|
|||||||
```python
|
```python
|
||||||
flush()
|
flush()
|
||||||
```
|
```
|
||||||
|
|
||||||
From the Accelerate library, you can also use a device-agnostic utility method called [release_memory](https://github.com/huggingface/accelerate/blob/29be4788629b772a3b722076e433b5b3b5c85da3/src/accelerate/utils/memory.py#L63), which takes various hardware backends like XPU, MLU, NPU, MPS, and more into account.
|
From the Accelerate library, you can also use a device-agnostic utility method called [release_memory](https://github.com/huggingface/accelerate/blob/29be4788629b772a3b722076e433b5b3b5c85da3/src/accelerate/utils/memory.py#L63), which takes various hardware backends like XPU, MLU, NPU, MPS, and more into account.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@ -204,6 +207,7 @@ result
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
Here is a Python function that transforms bytes to Giga bytes:\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single
|
Here is a Python function that transforms bytes to Giga bytes:\n\n```python\ndef bytes_to_giga_bytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single
|
||||||
```
|
```
|
||||||
@ -215,6 +219,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
15.219234466552734
|
15.219234466552734
|
||||||
```
|
```
|
||||||
@ -222,8 +227,8 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
Significantly less! We're down to just a bit over 15 GBs and could therefore run this model on consumer GPUs like the 4090.
|
Significantly less! We're down to just a bit over 15 GBs and could therefore run this model on consumer GPUs like the 4090.
|
||||||
We're seeing a very nice gain in memory efficiency and more or less no degradation to the model's output. However, we can also notice a slight slow-down during inference.
|
We're seeing a very nice gain in memory efficiency and more or less no degradation to the model's output. However, we can also notice a slight slow-down during inference.
|
||||||
|
|
||||||
|
|
||||||
We delete the models and flush the memory again.
|
We delete the models and flush the memory again.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
del model
|
del model
|
||||||
del pipe
|
del pipe
|
||||||
@ -245,6 +250,7 @@ result
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
Here is a Python function that transforms bytes to Giga bytes:\n\n```\ndef bytes_to_gigabytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single argument
|
Here is a Python function that transforms bytes to Giga bytes:\n\n```\ndef bytes_to_gigabytes(bytes):\n return bytes / 1024 / 1024 / 1024\n```\n\nThis function takes a single argument
|
||||||
```
|
```
|
||||||
@ -256,6 +262,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
9.543574333190918
|
9.543574333190918
|
||||||
```
|
```
|
||||||
@ -270,6 +277,7 @@ Also note that inference here was again a bit slower compared to 8-bit quantizat
|
|||||||
del model
|
del model
|
||||||
del pipe
|
del pipe
|
||||||
```
|
```
|
||||||
|
|
||||||
```python
|
```python
|
||||||
flush()
|
flush()
|
||||||
```
|
```
|
||||||
@ -384,6 +392,7 @@ def alternating(list1, list2):
|
|||||||
-----
|
-----
|
||||||
"""
|
"""
|
||||||
```
|
```
|
||||||
|
|
||||||
For demonstration purposes, we duplicate the system prompt by ten so that the input length is long enough to observe Flash Attention's memory savings.
|
For demonstration purposes, we duplicate the system prompt by ten so that the input length is long enough to observe Flash Attention's memory savings.
|
||||||
We append the original text prompt `"Question: Please write a function in Python that transforms bytes to Giga bytes.\n\nAnswer: Here"`
|
We append the original text prompt `"Question: Please write a function in Python that transforms bytes to Giga bytes.\n\nAnswer: Here"`
|
||||||
|
|
||||||
@ -413,6 +422,7 @@ result
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
Generated in 10.96854019165039 seconds.
|
Generated in 10.96854019165039 seconds.
|
||||||
Sure. Here is a function that does that.\n\ndef bytes_to_giga(bytes):\n return bytes / 1024 / 1024 / 1024\n\nAnswer: Sure. Here is a function that does that.\n\ndef
|
Sure. Here is a function that does that.\n\ndef bytes_to_giga(bytes):\n return bytes / 1024 / 1024 / 1024\n\nAnswer: Sure. Here is a function that does that.\n\ndef
|
||||||
@ -429,6 +439,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
37.668193340301514
|
37.668193340301514
|
||||||
```
|
```
|
||||||
@ -460,6 +471,7 @@ result
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
Generated in 3.0211617946624756 seconds.
|
Generated in 3.0211617946624756 seconds.
|
||||||
Sure. Here is a function that does that.\n\ndef bytes_to_giga(bytes):\n return bytes / 1024 / 1024 / 1024\n\nAnswer: Sure. Here is a function that does that.\n\ndef
|
Sure. Here is a function that does that.\n\ndef bytes_to_giga(bytes):\n return bytes / 1024 / 1024 / 1024\n\nAnswer: Sure. Here is a function that does that.\n\ndef
|
||||||
@ -474,6 +486,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
32.617331981658936
|
32.617331981658936
|
||||||
```
|
```
|
||||||
@ -604,6 +617,7 @@ generated_text
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
shape of input_ids torch.Size([1, 21])
|
shape of input_ids torch.Size([1, 21])
|
||||||
shape of input_ids torch.Size([1, 22])
|
shape of input_ids torch.Size([1, 22])
|
||||||
@ -641,6 +655,7 @@ generated_text
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
shape of input_ids torch.Size([1, 1])
|
shape of input_ids torch.Size([1, 1])
|
||||||
length of key-value cache 20
|
length of key-value cache 20
|
||||||
@ -712,6 +727,7 @@ tokenizer.batch_decode(generation_output.sequences)[0][len(prompt):]
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
is a modified version of the function that returns Mega bytes instead.
|
is a modified version of the function that returns Mega bytes instead.
|
||||||
|
|
||||||
@ -733,6 +749,7 @@ config = model.config
|
|||||||
```
|
```
|
||||||
|
|
||||||
**Output**:
|
**Output**:
|
||||||
|
|
||||||
```
|
```
|
||||||
7864320000
|
7864320000
|
||||||
```
|
```
|
||||||
@ -773,7 +790,6 @@ The most notable application of GQA is [Llama-v2](https://huggingface.co/meta-ll
|
|||||||
|
|
||||||
> As a conclusion, it is strongly recommended to make use of either GQA or MQA if the LLM is deployed with auto-regressive decoding and is required to handle large input sequences as is the case for example for chat.
|
> As a conclusion, it is strongly recommended to make use of either GQA or MQA if the LLM is deployed with auto-regressive decoding and is required to handle large input sequences as is the case for example for chat.
|
||||||
|
|
||||||
|
|
||||||
## Conclusion
|
## Conclusion
|
||||||
|
|
||||||
The research community is constantly coming up with new, nifty ways to speed up inference time for ever-larger LLMs. As an example, one such promising research direction is [speculative decoding](https://huggingface.co/papers/2211.17192) where "easy tokens" are generated by smaller, faster language models and only "hard tokens" are generated by the LLM itself. Going into more detail is out of the scope of this notebook, but can be read upon in this [nice blog post](https://huggingface.co/blog/assisted-generation).
|
The research community is constantly coming up with new, nifty ways to speed up inference time for ever-larger LLMs. As an example, one such promising research direction is [speculative decoding](https://huggingface.co/papers/2211.17192) where "easy tokens" are generated by smaller, faster language models and only "hard tokens" are generated by the LLM itself. Going into more detail is out of the scope of this notebook, but can be read upon in this [nice blog post](https://huggingface.co/blog/assisted-generation).
|
||||||
|
@ -54,7 +54,6 @@ The main class that implements callbacks is [`TrainerCallback`]. It gets the
|
|||||||
Trainer's internal state via [`TrainerState`], and can take some actions on the training loop via
|
Trainer's internal state via [`TrainerState`], and can take some actions on the training loop via
|
||||||
[`TrainerControl`].
|
[`TrainerControl`].
|
||||||
|
|
||||||
|
|
||||||
## Available Callbacks
|
## Available Callbacks
|
||||||
|
|
||||||
Here is the list of the available [`TrainerCallback`] in the library:
|
Here is the list of the available [`TrainerCallback`] in the library:
|
||||||
|
@ -24,7 +24,6 @@ Each derived config class implements model specific attributes. Common attribute
|
|||||||
`hidden_size`, `num_attention_heads`, and `num_hidden_layers`. Text models further implement:
|
`hidden_size`, `num_attention_heads`, and `num_hidden_layers`. Text models further implement:
|
||||||
`vocab_size`.
|
`vocab_size`.
|
||||||
|
|
||||||
|
|
||||||
## PretrainedConfig
|
## PretrainedConfig
|
||||||
|
|
||||||
[[autodoc]] PretrainedConfig
|
[[autodoc]] PretrainedConfig
|
||||||
|
@ -25,7 +25,6 @@ on the formed batch.
|
|||||||
|
|
||||||
Examples of use can be found in the [example scripts](../examples) or [example notebooks](../notebooks).
|
Examples of use can be found in the [example scripts](../examples) or [example notebooks](../notebooks).
|
||||||
|
|
||||||
|
|
||||||
## Default data collator
|
## Default data collator
|
||||||
|
|
||||||
[[autodoc]] data.data_collator.default_data_collator
|
[[autodoc]] data.data_collator.default_data_collator
|
||||||
|
@ -15,14 +15,12 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
-->
|
-->
|
||||||
|
|
||||||
|
|
||||||
# ExecuTorch
|
# ExecuTorch
|
||||||
|
|
||||||
[`ExecuTorch`](https://github.com/pytorch/executorch) is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch ecosystem and supports the deployment of PyTorch models with a focus on portability, productivity, and performance.
|
[`ExecuTorch`](https://github.com/pytorch/executorch) is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch ecosystem and supports the deployment of PyTorch models with a focus on portability, productivity, and performance.
|
||||||
|
|
||||||
ExecuTorch introduces well defined entry points to perform model, device, and/or use-case specific optimizations such as backend delegation, user-defined compiler transformations, memory planning, and more. The first step in preparing a PyTorch model for execution on an edge device using ExecuTorch is to export the model. This is achieved through the use of a PyTorch API called [`torch.export`](https://pytorch.org/docs/stable/export.html).
|
ExecuTorch introduces well defined entry points to perform model, device, and/or use-case specific optimizations such as backend delegation, user-defined compiler transformations, memory planning, and more. The first step in preparing a PyTorch model for execution on an edge device using ExecuTorch is to export the model. This is achieved through the use of a PyTorch API called [`torch.export`](https://pytorch.org/docs/stable/export.html).
|
||||||
|
|
||||||
|
|
||||||
## ExecuTorch Integration
|
## ExecuTorch Integration
|
||||||
|
|
||||||
An integration point is being developed to ensure that 🤗 Transformers can be exported using `torch.export`. The goal of this integration is not only to enable export but also to ensure that the exported artifact can be further lowered and optimized to run efficiently in `ExecuTorch`, particularly for mobile and edge use cases.
|
An integration point is being developed to ensure that 🤗 Transformers can be exported using `torch.export`. The goal of this integration is not only to enable export but also to ensure that the exported artifact can be further lowered and optimized to run efficiently in `ExecuTorch`, particularly for mobile and edge use cases.
|
||||||
|
@ -18,7 +18,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to generate Log-Mel Spectrogram features, feature extraction from images, e.g., cropping image files, but also padding, normalization, and conversion to NumPy and PyTorch tensors.
|
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction from sequences, e.g., pre-processing audio files to generate Log-Mel Spectrogram features, feature extraction from images, e.g., cropping image files, but also padding, normalization, and conversion to NumPy and PyTorch tensors.
|
||||||
|
|
||||||
|
|
||||||
## FeatureExtractionMixin
|
## FeatureExtractionMixin
|
||||||
|
|
||||||
[[autodoc]] feature_extraction_utils.FeatureExtractionMixin
|
[[autodoc]] feature_extraction_utils.FeatureExtractionMixin
|
||||||
|
@ -26,6 +26,7 @@ from transformers import AutoImageProcessor
|
|||||||
|
|
||||||
processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
|
processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50", use_fast=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that `use_fast` will be set to `True` by default in a future release.
|
Note that `use_fast` will be set to `True` by default in a future release.
|
||||||
|
|
||||||
When using a fast image processor, you can also set the `device` argument to specify the device on which the processing should be done. By default, the processing is done on the same device as the inputs if the inputs are tensors, or on the CPU otherwise.
|
When using a fast image processor, you can also set the `device` argument to specify the device on which the processing should be done. By default, the processing is done on the same device as the inputs if the inputs are tensors, or on the CPU otherwise.
|
||||||
@ -57,7 +58,6 @@ Here are some speed comparisons between the base and fast image processors for t
|
|||||||
|
|
||||||
These benchmarks were run on an [AWS EC2 g5.2xlarge instance](https://aws.amazon.com/ec2/instance-types/g5/), utilizing an NVIDIA A10G Tensor Core GPU.
|
These benchmarks were run on an [AWS EC2 g5.2xlarge instance](https://aws.amazon.com/ec2/instance-types/g5/), utilizing an NVIDIA A10G Tensor Core GPU.
|
||||||
|
|
||||||
|
|
||||||
## ImageProcessingMixin
|
## ImageProcessingMixin
|
||||||
|
|
||||||
[[autodoc]] image_processing_utils.ImageProcessingMixin
|
[[autodoc]] image_processing_utils.ImageProcessingMixin
|
||||||
@ -72,7 +72,6 @@ These benchmarks were run on an [AWS EC2 g5.2xlarge instance](https://aws.amazon
|
|||||||
|
|
||||||
[[autodoc]] image_processing_utils.BaseImageProcessor
|
[[autodoc]] image_processing_utils.BaseImageProcessor
|
||||||
|
|
||||||
|
|
||||||
## BaseImageProcessorFast
|
## BaseImageProcessorFast
|
||||||
|
|
||||||
[[autodoc]] image_processing_utils_fast.BaseImageProcessorFast
|
[[autodoc]] image_processing_utils_fast.BaseImageProcessorFast
|
||||||
|
@ -55,7 +55,6 @@ logger.info("INFO")
|
|||||||
logger.warning("WARN")
|
logger.warning("WARN")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
All the methods of this logging module are documented below, the main ones are
|
All the methods of this logging module are documented below, the main ones are
|
||||||
[`logging.get_verbosity`] to get the current level of verbosity in the logger and
|
[`logging.get_verbosity`] to get the current level of verbosity in the logger and
|
||||||
[`logging.set_verbosity`] to set the verbosity to the level of your choice. In order (from the least
|
[`logging.set_verbosity`] to set the verbosity to the level of your choice. In order (from the least
|
||||||
|
@ -26,7 +26,6 @@ file or directory, or from a pretrained model configuration provided by the libr
|
|||||||
|
|
||||||
The other methods that are common to each model are defined in [`~modeling_utils.ModuleUtilsMixin`] and [`~generation.GenerationMixin`].
|
The other methods that are common to each model are defined in [`~modeling_utils.ModuleUtilsMixin`] and [`~generation.GenerationMixin`].
|
||||||
|
|
||||||
|
|
||||||
## PreTrainedModel
|
## PreTrainedModel
|
||||||
|
|
||||||
[[autodoc]] PreTrainedModel
|
[[autodoc]] PreTrainedModel
|
||||||
|
@ -51,4 +51,3 @@ to export models for different types of topologies or tasks.
|
|||||||
### FeaturesManager
|
### FeaturesManager
|
||||||
|
|
||||||
[[autodoc]] onnx.features.FeaturesManager
|
[[autodoc]] onnx.features.FeaturesManager
|
||||||
|
|
||||||
|
@ -22,7 +22,6 @@ The `.optimization` module provides:
|
|||||||
- several schedules in the form of schedule objects that inherit from `_LRSchedule`:
|
- several schedules in the form of schedule objects that inherit from `_LRSchedule`:
|
||||||
- a gradient accumulation class to accumulate the gradients of multiple batches
|
- a gradient accumulation class to accumulate the gradients of multiple batches
|
||||||
|
|
||||||
|
|
||||||
## AdaFactor
|
## AdaFactor
|
||||||
|
|
||||||
[[autodoc]] Adafactor
|
[[autodoc]] Adafactor
|
||||||
|
@ -47,7 +47,6 @@ However, this is not always the case. Some models apply normalization or subsequ
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|
||||||
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
|
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
|
||||||
will get `None`. Here for instance `outputs.loss` is the loss computed by the model, and `outputs.attentions` is
|
will get `None`. Here for instance `outputs.loss` is the loss computed by the model, and `outputs.attentions` is
|
||||||
`None`.
|
`None`.
|
||||||
|
@ -81,7 +81,6 @@ for out in tqdm(pipe(KeyDataset(dataset, "file"))):
|
|||||||
|
|
||||||
For ease of use, a generator is also possible:
|
For ease of use, a generator is also possible:
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import pipeline
|
from transformers import pipeline
|
||||||
|
|
||||||
@ -196,7 +195,6 @@ This is a occasional very long sentence compared to the other. In that case, the
|
|||||||
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
|
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
|
||||||
bigger batches, the program simply crashes.
|
bigger batches, the program simply crashes.
|
||||||
|
|
||||||
|
|
||||||
```
|
```
|
||||||
------------------------------
|
------------------------------
|
||||||
Streaming no batching
|
Streaming no batching
|
||||||
@ -245,7 +243,6 @@ multiple forward pass of a model. Under normal circumstances, this would yield i
|
|||||||
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
|
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
|
||||||
regular `Pipeline`. In short:
|
regular `Pipeline`. In short:
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
preprocessed = pipe.preprocess(inputs)
|
preprocessed = pipe.preprocess(inputs)
|
||||||
model_outputs = pipe.forward(preprocessed)
|
model_outputs = pipe.forward(preprocessed)
|
||||||
@ -254,7 +251,6 @@ outputs = pipe.postprocess(model_outputs)
|
|||||||
|
|
||||||
Now becomes:
|
Now becomes:
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
all_model_outputs = []
|
all_model_outputs = []
|
||||||
for preprocessed in pipe.preprocess(inputs):
|
for preprocessed in pipe.preprocess(inputs):
|
||||||
@ -282,7 +278,6 @@ If you want to override a specific pipeline.
|
|||||||
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
|
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
|
||||||
cases, so `transformers` could maybe support your use case.
|
cases, so `transformers` could maybe support your use case.
|
||||||
|
|
||||||
|
|
||||||
If you want to try simply you can:
|
If you want to try simply you can:
|
||||||
|
|
||||||
- Subclass your pipeline of choice
|
- Subclass your pipeline of choice
|
||||||
@ -302,7 +297,6 @@ my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
|
|||||||
|
|
||||||
That should enable you to do all the custom code you want.
|
That should enable you to do all the custom code you want.
|
||||||
|
|
||||||
|
|
||||||
## Implementing a pipeline
|
## Implementing a pipeline
|
||||||
|
|
||||||
[Implementing a new pipeline](../add_new_pipeline)
|
[Implementing a new pipeline](../add_new_pipeline)
|
||||||
@ -329,7 +323,6 @@ Pipelines available for audio tasks include the following.
|
|||||||
- __call__
|
- __call__
|
||||||
- all
|
- all
|
||||||
|
|
||||||
|
|
||||||
### ZeroShotAudioClassificationPipeline
|
### ZeroShotAudioClassificationPipeline
|
||||||
|
|
||||||
[[autodoc]] ZeroShotAudioClassificationPipeline
|
[[autodoc]] ZeroShotAudioClassificationPipeline
|
||||||
|
@ -71,7 +71,6 @@ Additionally, the following method can be used to load values from a data file a
|
|||||||
|
|
||||||
[[autodoc]] data.processors.glue.glue_convert_examples_to_features
|
[[autodoc]] data.processors.glue.glue_convert_examples_to_features
|
||||||
|
|
||||||
|
|
||||||
## XNLI
|
## XNLI
|
||||||
|
|
||||||
[The Cross-Lingual NLI Corpus (XNLI)](https://www.nyu.edu/projects/bowman/xnli/) is a benchmark that evaluates the
|
[The Cross-Lingual NLI Corpus (XNLI)](https://www.nyu.edu/projects/bowman/xnli/) is a benchmark that evaluates the
|
||||||
@ -88,7 +87,6 @@ Please note that since the gold labels are available on the test set, evaluation
|
|||||||
|
|
||||||
An example using these processors is given in the [run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_xnli.py) script.
|
An example using these processors is given in the [run_xnli.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_xnli.py) script.
|
||||||
|
|
||||||
|
|
||||||
## SQuAD
|
## SQuAD
|
||||||
|
|
||||||
[The Stanford Question Answering Dataset (SQuAD)](https://rajpurkar.github.io/SQuAD-explorer//) is a benchmark that
|
[The Stanford Question Answering Dataset (SQuAD)](https://rajpurkar.github.io/SQuAD-explorer//) is a benchmark that
|
||||||
@ -115,11 +113,9 @@ Additionally, the following method can be used to convert SQuAD examples into
|
|||||||
|
|
||||||
[[autodoc]] data.processors.squad.squad_convert_examples_to_features
|
[[autodoc]] data.processors.squad.squad_convert_examples_to_features
|
||||||
|
|
||||||
|
|
||||||
These processors as well as the aforementioned method can be used with files containing the data as well as with the
|
These processors as well as the aforementioned method can be used with files containing the data as well as with the
|
||||||
*tensorflow_datasets* package. Examples are given below.
|
*tensorflow_datasets* package. Examples are given below.
|
||||||
|
|
||||||
|
|
||||||
### Example usage
|
### Example usage
|
||||||
|
|
||||||
Here is an example using the processors as well as the conversion method using data files:
|
Here is an example using the processors as well as the conversion method using data files:
|
||||||
|
@ -50,7 +50,6 @@ several advanced alignment methods which can be used to map between the original
|
|||||||
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding
|
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding
|
||||||
to a given token).
|
to a given token).
|
||||||
|
|
||||||
|
|
||||||
# Multimodal Tokenizer
|
# Multimodal Tokenizer
|
||||||
|
|
||||||
Apart from that each tokenizer can be a "multimodal" tokenizer which means that the tokenizer will hold all relevant special tokens
|
Apart from that each tokenizer can be a "multimodal" tokenizer which means that the tokenizer will hold all relevant special tokens
|
||||||
|
@ -22,7 +22,6 @@ The video processor extends the functionality of image processors by allowing Vi
|
|||||||
|
|
||||||
When adding a new VLM or updating an existing one to enable distinct video preprocessing, saving and reloading the processor configuration will store the video related arguments in a dedicated file named `video_preprocessing_config.json`. Don't worry if you haven't updated your VLM, the processor will try to load video related configurations from a file named `preprocessing_config.json`.
|
When adding a new VLM or updating an existing one to enable distinct video preprocessing, saving and reloading the processor configuration will store the video related arguments in a dedicated file named `video_preprocessing_config.json`. Don't worry if you haven't updated your VLM, the processor will try to load video related configurations from a file named `preprocessing_config.json`.
|
||||||
|
|
||||||
|
|
||||||
### Usage Example
|
### Usage Example
|
||||||
Here's an example of how to load a video processor with [`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf) model:
|
Here's an example of how to load a video processor with [`llava-hf/llava-onevision-qwen2-0.5b-ov-hf`](https://huggingface.co/llava-hf/llava-onevision-qwen2-0.5b-ov-hf) model:
|
||||||
|
|
||||||
@ -59,7 +58,6 @@ The video processor can also sample video frames using the technique best suited
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoVideoProcessor
|
from transformers import AutoVideoProcessor
|
||||||
|
|
||||||
@ -92,4 +90,3 @@ print(processed_video_inputs.pixel_values_videos.shape)
|
|||||||
## BaseVideoProcessor
|
## BaseVideoProcessor
|
||||||
|
|
||||||
[[autodoc]] video_processing_utils.BaseVideoProcessor
|
[[autodoc]] video_processing_utils.BaseVideoProcessor
|
||||||
|
|
||||||
|
@ -25,7 +25,6 @@ The abstract from the paper is the following:
|
|||||||
|
|
||||||
*We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process, scalability, and remarkable performance across a range of downstream tasks. This is achieved by pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens. Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification. Notably, our AIMV2-3B encoder achieves 89.5% accuracy on ImageNet-1k with a frozen trunk. Furthermore, AIMV2 consistently outperforms state-of-the-art contrastive models (e.g., CLIP, SigLIP) in multimodal image understanding across diverse settings.*
|
*We introduce a novel method for pre-training of large-scale vision encoders. Building on recent advancements in autoregressive pre-training of vision models, we extend this framework to a multimodal setting, i.e., images and text. In this paper, we present AIMV2, a family of generalist vision encoders characterized by a straightforward pre-training process, scalability, and remarkable performance across a range of downstream tasks. This is achieved by pairing the vision encoder with a multimodal decoder that autoregressively generates raw image patches and text tokens. Our encoders excel not only in multimodal evaluations but also in vision benchmarks such as localization, grounding, and classification. Notably, our AIMV2-3B encoder achieves 89.5% accuracy on ImageNet-1k with a frozen trunk. Furthermore, AIMV2 consistently outperforms state-of-the-art contrastive models (e.g., CLIP, SigLIP) in multimodal image understanding across diverse settings.*
|
||||||
|
|
||||||
|
|
||||||
This model was contributed by [Yaswanth Gali](https://huggingface.co/yaswanthgali).
|
This model was contributed by [Yaswanth Gali](https://huggingface.co/yaswanthgali).
|
||||||
The original code can be found [here](https://github.com/apple/ml-aim).
|
The original code can be found [here](https://github.com/apple/ml-aim).
|
||||||
|
|
||||||
|
@ -142,7 +142,6 @@ response = processor.decode(output_ids, skip_special_tokens=True)
|
|||||||
print(response)
|
print(response)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## AriaImageProcessor
|
## AriaImageProcessor
|
||||||
|
|
||||||
[[autodoc]] AriaImageProcessor
|
[[autodoc]] AriaImageProcessor
|
||||||
|
@ -23,7 +23,6 @@ automatically retrieve the relevant model given the name/path to the pretrained
|
|||||||
Instantiating one of [`AutoConfig`], [`AutoModel`], and
|
Instantiating one of [`AutoConfig`], [`AutoModel`], and
|
||||||
[`AutoTokenizer`] will directly create a class of the relevant architecture. For instance
|
[`AutoTokenizer`] will directly create a class of the relevant architecture. For instance
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
|
model = AutoModel.from_pretrained("google-bert/bert-base-cased")
|
||||||
```
|
```
|
||||||
|
@ -86,7 +86,6 @@ Next, [install](https://github.com/Dao-AILab/flash-attention#installation-and-fe
|
|||||||
pip install -U flash-attn --no-build-isolation
|
pip install -U flash-attn --no-build-isolation
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
##### Usage
|
##### Usage
|
||||||
|
|
||||||
To load a model using Flash Attention 2, we can pass the `attn_implementation="flash_attention_2"` flag to [`.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). We'll also load the model in half-precision (e.g. `torch.float16`), since it results in almost no degradation to audio quality but significantly lower memory usage and faster inference:
|
To load a model using Flash Attention 2, we can pass the `attn_implementation="flash_attention_2"` flag to [`.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained). We'll also load the model in half-precision (e.g. `torch.float16`), since it results in almost no degradation to audio quality but significantly lower memory usage and faster inference:
|
||||||
@ -97,7 +96,6 @@ model = BarkModel.from_pretrained("suno/bark-small", dtype=torch.float16, attn_i
|
|||||||
|
|
||||||
##### Performance comparison
|
##### Performance comparison
|
||||||
|
|
||||||
|
|
||||||
The following diagram shows the latency for the native attention implementation (no optimisation) against Better Transformer and Flash Attention 2. In all cases, we generate 400 semantic tokens on a 40GB A100 GPU with PyTorch 2.1. Flash Attention 2 is also consistently faster than Better Transformer, and its performance improves even more as batch sizes increase:
|
The following diagram shows the latency for the native attention implementation (no optimisation) against Better Transformer and Flash Attention 2. In all cases, we generate 400 semantic tokens on a 40GB A100 GPU with PyTorch 2.1. Flash Attention 2 is also consistently faster than Better Transformer, and its performance improves even more as batch sizes increase:
|
||||||
|
|
||||||
<div style="text-align: center">
|
<div style="text-align: center">
|
||||||
@ -108,7 +106,6 @@ To put this into perspective, on an NVIDIA A100 and when generating 400 semantic
|
|||||||
|
|
||||||
At batch size 8, on an NVIDIA A100, Flash Attention 2 is also 10% faster than Better Transformer, and at batch size 16, 25%.
|
At batch size 8, on an NVIDIA A100, Flash Attention 2 is also 10% faster than Better Transformer, and at batch size 16, 25%.
|
||||||
|
|
||||||
|
|
||||||
#### Combining optimization techniques
|
#### Combining optimization techniques
|
||||||
|
|
||||||
You can combine optimization techniques, and use CPU offload, half-precision and Flash Attention 2 (or 🤗 Better Transformer) all at once.
|
You can combine optimization techniques, and use CPU offload, half-precision and Flash Attention 2 (or 🤗 Better Transformer) all at once.
|
||||||
@ -165,7 +162,6 @@ Bark can generate highly realistic, **multilingual** speech as well as other aud
|
|||||||
|
|
||||||
The model can also produce **nonverbal communications** like laughing, sighing and crying.
|
The model can also produce **nonverbal communications** like laughing, sighing and crying.
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> # Adding non-speech cues to the input text
|
>>> # Adding non-speech cues to the input text
|
||||||
>>> inputs = processor("Hello uh ... [clears throat], my dog is cute [laughter]")
|
>>> inputs = processor("Hello uh ... [clears throat], my dog is cute [laughter]")
|
||||||
@ -235,4 +231,3 @@ To save the audio, simply take the sample rate from the model config and some sc
|
|||||||
|
|
||||||
[[autodoc]] BarkSemanticConfig
|
[[autodoc]] BarkSemanticConfig
|
||||||
- all
|
- all
|
||||||
|
|
||||||
|
@ -15,7 +15,6 @@ rendered properly in your Markdown viewer.
|
|||||||
-->
|
-->
|
||||||
*This model was released on 2019-10-29 and added to Hugging Face Transformers on 2020-11-16.*
|
*This model was released on 2019-10-29 and added to Hugging Face Transformers on 2020-11-16.*
|
||||||
|
|
||||||
|
|
||||||
<div style="float: right;">
|
<div style="float: right;">
|
||||||
<div class="flex flex-wrap space-x-1">
|
<div class="flex flex-wrap space-x-1">
|
||||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||||
@ -46,6 +45,7 @@ pipeline = pipeline(
|
|||||||
pipeline("Plants create <mask> through a process known as photosynthesis.")
|
pipeline("Plants create <mask> through a process known as photosynthesis.")
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
|
@ -31,7 +31,6 @@ You can find all of the original BARThez checkpoints under the [BARThez](https:/
|
|||||||
> This model was contributed by [moussakam](https://huggingface.co/moussakam).
|
> This model was contributed by [moussakam](https://huggingface.co/moussakam).
|
||||||
> Refer to the [BART](./bart) docs for more usage examples.
|
> Refer to the [BART](./bart) docs for more usage examples.
|
||||||
|
|
||||||
|
|
||||||
The example below demonstrates how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
|
The example below demonstrates how to predict the `<mask>` token with [`Pipeline`], [`AutoModel`], and from the command line.
|
||||||
|
|
||||||
<hfoptions id="usage">
|
<hfoptions id="usage">
|
||||||
|
@ -33,12 +33,9 @@ You can find all the original checkpoints under the [VinAI](https://huggingface.
|
|||||||
|
|
||||||
The example below demonstrates how to summarize text with [`Pipeline`] or the [`AutoModel`] class.
|
The example below demonstrates how to summarize text with [`Pipeline`] or the [`AutoModel`] class.
|
||||||
|
|
||||||
|
|
||||||
<hfoptions id="usage">
|
<hfoptions id="usage">
|
||||||
<hfoption id="Pipeline">
|
<hfoption id="Pipeline">
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import torch
|
import torch
|
||||||
from transformers import pipeline
|
from transformers import pipeline
|
||||||
@ -98,8 +95,6 @@ transformers run --task summarization --model vinai/bartpho-word --device 0
|
|||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- BARTpho uses the large architecture of BART with an additional layer-normalization layer on top of the encoder and decoder. The BART-specific classes should be replaced with the mBART-specific classes.
|
- BARTpho uses the large architecture of BART with an additional layer-normalization layer on top of the encoder and decoder. The BART-specific classes should be replaced with the mBART-specific classes.
|
||||||
|
@ -81,7 +81,6 @@ API reference information.
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|
||||||
## BertJapaneseTokenizer
|
## BertJapaneseTokenizer
|
||||||
|
|
||||||
[[autodoc]] BertJapaneseTokenizer
|
[[autodoc]] BertJapaneseTokenizer
|
||||||
|
@ -26,7 +26,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
[BERTweet](https://huggingface.co/papers/2005.10200) shares the same architecture as [BERT-base](./bert), but it’s pretrained like [RoBERTa](./roberta) on English Tweets. It performs really well on Tweet-related tasks like part-of-speech tagging, named entity recognition, and text classification.
|
[BERTweet](https://huggingface.co/papers/2005.10200) shares the same architecture as [BERT-base](./bert), but it’s pretrained like [RoBERTa](./roberta) on English Tweets. It performs really well on Tweet-related tasks like part-of-speech tagging, named entity recognition, and text classification.
|
||||||
|
|
||||||
|
|
||||||
You can find all the original BERTweet checkpoints under the [VinAI Research](https://huggingface.co/vinai?search_models=BERTweet) organization.
|
You can find all the original BERTweet checkpoints under the [VinAI Research](https://huggingface.co/vinai?search_models=BERTweet) organization.
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
@ -49,6 +48,7 @@ pipeline = pipeline(
|
|||||||
)
|
)
|
||||||
pipeline("Plants create <mask> through a process known as photosynthesis.")
|
pipeline("Plants create <mask> through a process known as photosynthesis.")
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
|
@ -47,6 +47,7 @@ pipeline = pipeline(
|
|||||||
)
|
)
|
||||||
pipeline("Plants create [MASK] through a process known as photosynthesis.")
|
pipeline("Plants create [MASK] through a process known as photosynthesis.")
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
@ -81,6 +82,7 @@ print(f"The predicted token is: {predicted_token}")
|
|||||||
```bash
|
```bash
|
||||||
!echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers run --task fill-mask --model google/bigbird-roberta-base --device 0
|
!echo -e "Plants create [MASK] through a process known as photosynthesis." | transformers run --task fill-mask --model google/bigbird-roberta-base --device 0
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
@ -52,6 +52,7 @@ Through photosynthesis, plants capture energy from sunlight using a green pigmen
|
|||||||
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
|
These ingredients are then transformed into glucose, a type of sugar that serves as a source of chemical energy, and oxygen, which is released as a byproduct into the atmosphere. The glucose produced during photosynthesis is not just used immediately; plants also store it as starch or convert it into other organic compounds like cellulose, which is essential for building their cellular structure.
|
||||||
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle.""")
|
This energy reserve allows them to grow, develop leaves, produce flowers, bear fruit, and carry out various physiological processes throughout their lifecycle.""")
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
@ -77,6 +78,7 @@ input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
|
|||||||
output = model.generate(**input_ids, cache_implementation="static")
|
output = model.generate(**input_ids, cache_implementation="static")
|
||||||
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="transformers">
|
<hfoption id="transformers">
|
||||||
|
|
||||||
|
@ -135,31 +135,26 @@ print(output)
|
|||||||
|
|
||||||
[[autodoc]] BioGptConfig
|
[[autodoc]] BioGptConfig
|
||||||
|
|
||||||
|
|
||||||
## BioGptTokenizer
|
## BioGptTokenizer
|
||||||
|
|
||||||
[[autodoc]] BioGptTokenizer
|
[[autodoc]] BioGptTokenizer
|
||||||
- save_vocabulary
|
- save_vocabulary
|
||||||
|
|
||||||
|
|
||||||
## BioGptModel
|
## BioGptModel
|
||||||
|
|
||||||
[[autodoc]] BioGptModel
|
[[autodoc]] BioGptModel
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## BioGptForCausalLM
|
## BioGptForCausalLM
|
||||||
|
|
||||||
[[autodoc]] BioGptForCausalLM
|
[[autodoc]] BioGptForCausalLM
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## BioGptForTokenClassification
|
## BioGptForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] BioGptForTokenClassification
|
[[autodoc]] BioGptForTokenClassification
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## BioGptForSequenceClassification
|
## BioGptForSequenceClassification
|
||||||
|
|
||||||
[[autodoc]] BioGptForSequenceClassification
|
[[autodoc]] BioGptForSequenceClassification
|
||||||
|
@ -35,10 +35,8 @@ Several versions of the model weights are available on Hugging Face:
|
|||||||
|
|
||||||
* [**`microsoft/bitnet-b1.58-2B-4T-gguf`**](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf): Contains the model weights in GGUF format, compatible with the `bitnet.cpp` library for CPU inference.
|
* [**`microsoft/bitnet-b1.58-2B-4T-gguf`**](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf): Contains the model weights in GGUF format, compatible with the `bitnet.cpp` library for CPU inference.
|
||||||
|
|
||||||
|
|
||||||
### Model Details
|
### Model Details
|
||||||
|
|
||||||
|
|
||||||
* **Architecture:** Transformer-based, modified with `BitLinear` layers (BitNet framework).
|
* **Architecture:** Transformer-based, modified with `BitLinear` layers (BitNet framework).
|
||||||
* Uses Rotary Position Embeddings (RoPE).
|
* Uses Rotary Position Embeddings (RoPE).
|
||||||
* Uses squared ReLU (ReLU²) activation in FFN layers.
|
* Uses squared ReLU (ReLU²) activation in FFN layers.
|
||||||
@ -58,10 +56,8 @@ Several versions of the model weights are available on Hugging Face:
|
|||||||
3. **Direct Preference Optimization (DPO):** Aligned with human preferences using preference pairs.
|
3. **Direct Preference Optimization (DPO):** Aligned with human preferences using preference pairs.
|
||||||
* **Tokenizer:** LLaMA 3 Tokenizer (vocab size: 128,256).
|
* **Tokenizer:** LLaMA 3 Tokenizer (vocab size: 128,256).
|
||||||
|
|
||||||
|
|
||||||
## Usage tips
|
## Usage tips
|
||||||
|
|
||||||
|
|
||||||
**VERY IMPORTANT NOTE ON EFFICIENCY**
|
**VERY IMPORTANT NOTE ON EFFICIENCY**
|
||||||
|
|
||||||
> Please do NOT expect performance efficiency gains (in terms of speed, latency, or energy consumption) when using this model with the standard transformers library.
|
> Please do NOT expect performance efficiency gains (in terms of speed, latency, or energy consumption) when using this model with the standard transformers library.
|
||||||
@ -106,7 +102,6 @@ response = tokenizer.decode(chat_outputs[0][chat_input.shape[-1]:], skip_special
|
|||||||
print("\nAssistant Response:", response)
|
print("\nAssistant Response:", response)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## BitNetConfig
|
## BitNetConfig
|
||||||
|
|
||||||
[[autodoc]] BitNetConfig
|
[[autodoc]] BitNetConfig
|
||||||
|
@ -55,7 +55,6 @@ found [here](https://github.com/facebookresearch/ParlAI).
|
|||||||
Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
the left.
|
the left.
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
- [Causal language modeling task guide](../tasks/language_modeling)
|
- [Causal language modeling task guide](../tasks/language_modeling)
|
||||||
|
@ -71,7 +71,6 @@ An example:
|
|||||||
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
|
`facebook/blenderbot_small_90M`, have a different architecture and consequently should be used with
|
||||||
[BlenderbotSmall](blenderbot-small).
|
[BlenderbotSmall](blenderbot-small).
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
- [Causal language modeling task guide](../tasks/language_modeling)
|
- [Causal language modeling task guide](../tasks/language_modeling)
|
||||||
|
@ -25,7 +25,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
[BLIP](https://huggingface.co/papers/2201.12086) (Bootstrapped Language-Image Pretraining) is a vision-language pretraining (VLP) framework designed for *both* understanding and generation tasks. Most existing pretrained models are only good at one or the other. It uses a captioner to generate captions and a filter to remove the noisy captions. This increases training data quality and more effectively uses the messy web data.
|
[BLIP](https://huggingface.co/papers/2201.12086) (Bootstrapped Language-Image Pretraining) is a vision-language pretraining (VLP) framework designed for *both* understanding and generation tasks. Most existing pretrained models are only good at one or the other. It uses a captioner to generate captions and a filter to remove the noisy captions. This increases training data quality and more effectively uses the messy web data.
|
||||||
|
|
||||||
|
|
||||||
You can find all the original BLIP checkpoints under the [BLIP](https://huggingface.co/collections/Salesforce/blip-models-65242f40f1491fbf6a9e9472) collection.
|
You can find all the original BLIP checkpoints under the [BLIP](https://huggingface.co/collections/Salesforce/blip-models-65242f40f1491fbf6a9e9472) collection.
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
|
@ -48,7 +48,6 @@ See also:
|
|||||||
- [Token classification task guide](../tasks/token_classification)
|
- [Token classification task guide](../tasks/token_classification)
|
||||||
- [Question answering task guide](../tasks/question_answering)
|
- [Question answering task guide](../tasks/question_answering)
|
||||||
|
|
||||||
|
|
||||||
⚡️ Inference
|
⚡️ Inference
|
||||||
- A blog on [Optimization story: Bloom inference](https://huggingface.co/blog/bloom-inference-optimization).
|
- A blog on [Optimization story: Bloom inference](https://huggingface.co/blog/bloom-inference-optimization).
|
||||||
- A blog on [Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate](https://huggingface.co/blog/bloom-inference-pytorch-scripts).
|
- A blog on [Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate](https://huggingface.co/blog/bloom-inference-pytorch-scripts).
|
||||||
|
@ -83,7 +83,6 @@ print(tokenizer.decode(generated_ids[0]))
|
|||||||
This model was contributed by [itazap](https://huggingface.co/<itazap>).
|
This model was contributed by [itazap](https://huggingface.co/<itazap>).
|
||||||
The original code can be found [here](<https://github.com/facebookresearch/blt>).
|
The original code can be found [here](<https://github.com/facebookresearch/blt>).
|
||||||
|
|
||||||
|
|
||||||
## BltConfig
|
## BltConfig
|
||||||
|
|
||||||
[[autodoc]] BltConfig
|
[[autodoc]] BltConfig
|
||||||
|
@ -54,6 +54,7 @@ The [`BridgeTowerProcessor`] wraps [`RobertaTokenizer`] and [`BridgeTowerImagePr
|
|||||||
encode the text and prepare the images respectively.
|
encode the text and prepare the images respectively.
|
||||||
|
|
||||||
The following example shows how to run contrastive learning using [`BridgeTowerProcessor`] and [`BridgeTowerForContrastiveLearning`].
|
The following example shows how to run contrastive learning using [`BridgeTowerProcessor`] and [`BridgeTowerForContrastiveLearning`].
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import BridgeTowerProcessor, BridgeTowerForContrastiveLearning
|
>>> from transformers import BridgeTowerProcessor, BridgeTowerForContrastiveLearning
|
||||||
>>> import requests
|
>>> import requests
|
||||||
@ -76,6 +77,7 @@ The following example shows how to run contrastive learning using [`BridgeTowerP
|
|||||||
```
|
```
|
||||||
|
|
||||||
The following example shows how to run image-text retrieval using [`BridgeTowerProcessor`] and [`BridgeTowerForImageAndTextRetrieval`].
|
The following example shows how to run image-text retrieval using [`BridgeTowerProcessor`] and [`BridgeTowerForImageAndTextRetrieval`].
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import BridgeTowerProcessor, BridgeTowerForImageAndTextRetrieval
|
>>> from transformers import BridgeTowerProcessor, BridgeTowerForImageAndTextRetrieval
|
||||||
>>> import requests
|
>>> import requests
|
||||||
@ -130,7 +132,6 @@ Tips:
|
|||||||
- Please refer to [Table 5](https://huggingface.co/papers/2206.08657) for BridgeTower's performance on Image Retrieval and other down stream tasks.
|
- Please refer to [Table 5](https://huggingface.co/papers/2206.08657) for BridgeTower's performance on Image Retrieval and other down stream tasks.
|
||||||
- The PyTorch version of this model is only available in torch 1.10 and higher.
|
- The PyTorch version of this model is only available in torch 1.10 and higher.
|
||||||
|
|
||||||
|
|
||||||
## BridgeTowerConfig
|
## BridgeTowerConfig
|
||||||
|
|
||||||
[[autodoc]] BridgeTowerConfig
|
[[autodoc]] BridgeTowerConfig
|
||||||
@ -177,4 +178,3 @@ Tips:
|
|||||||
|
|
||||||
[[autodoc]] BridgeTowerForImageAndTextRetrieval
|
[[autodoc]] BridgeTowerForImageAndTextRetrieval
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
@ -57,7 +57,6 @@ def expand_and_normalize_bbox(bboxes, doc_width, doc_height):
|
|||||||
|
|
||||||
- [`~transformers.BrosForTokenClassification.forward`, `~transformers.BrosSpadeEEForTokenClassification.forward`, `~transformers.BrosSpadeEEForTokenClassification.forward`] require not only `input_ids` and `bbox` but also `box_first_token_mask` for loss calculation. It is a mask to filter out non-first tokens of each box. You can obtain this mask by saving start token indices of bounding boxes when creating `input_ids` from words. You can make `box_first_token_mask` with following code,
|
- [`~transformers.BrosForTokenClassification.forward`, `~transformers.BrosSpadeEEForTokenClassification.forward`, `~transformers.BrosSpadeEEForTokenClassification.forward`] require not only `input_ids` and `bbox` but also `box_first_token_mask` for loss calculation. It is a mask to filter out non-first tokens of each box. You can obtain this mask by saving start token indices of bounding boxes when creating `input_ids` from words. You can make `box_first_token_mask` with following code,
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
def make_box_first_token_mask(bboxes, words, tokenizer, max_seq_length=512):
|
def make_box_first_token_mask(bboxes, words, tokenizer, max_seq_length=512):
|
||||||
|
|
||||||
@ -102,7 +101,6 @@ def make_box_first_token_mask(bboxes, words, tokenizer, max_seq_length=512):
|
|||||||
[[autodoc]] BrosModel
|
[[autodoc]] BrosModel
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## BrosForTokenClassification
|
## BrosForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] BrosForTokenClassification
|
[[autodoc]] BrosForTokenClassification
|
||||||
|
@ -50,6 +50,7 @@ from transformers import pipeline
|
|||||||
pipeline = pipeline("fill-mask", model="camembert-base", dtype=torch.float16, device=0)
|
pipeline = pipeline("fill-mask", model="camembert-base", dtype=torch.float16, device=0)
|
||||||
pipeline("Le camembert est un délicieux fromage <mask>.")
|
pipeline("Le camembert est un délicieux fromage <mask>.")
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
|
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
@ -72,6 +73,7 @@ predicted_token = tokenizer.decode(predicted_token_id)
|
|||||||
|
|
||||||
print(f"The predicted token is: {predicted_token}")
|
print(f"The predicted token is: {predicted_token}")
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
|
|
||||||
<hfoption id="transformers CLI">
|
<hfoption id="transformers CLI">
|
||||||
@ -84,7 +86,6 @@ echo -e "Le camembert est un délicieux fromage <mask>." | transformers run --ta
|
|||||||
|
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
|
||||||
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](../quantization/overview) overview for available options.
|
Quantization reduces the memory burden of large models by representing weights in lower precision. Refer to the [Quantization](../quantization/overview) overview for available options.
|
||||||
|
|
||||||
The example below uses [bitsandbytes](../quantization/bitsandbytes) quantization to quantize the weights to 8-bits.
|
The example below uses [bitsandbytes](../quantization/bitsandbytes) quantization to quantize the weights to 8-bits.
|
||||||
|
@ -86,6 +86,7 @@ echo -e "Plant create energy through a process known as photosynthesis." | trans
|
|||||||
inputs = ["Life is like a box of chocolates.", "You never know what you gonna get."]
|
inputs = ["Life is like a box of chocolates.", "You never know what you gonna get."]
|
||||||
encoding = tokenizer(inputs, padding="longest", truncation=True, return_tensors="pt")
|
encoding = tokenizer(inputs, padding="longest", truncation=True, return_tensors="pt")
|
||||||
```
|
```
|
||||||
|
|
||||||
- CANINE is primarily designed to be fine-tuned on a downstream task. The pretrained model can be used for either masked language modeling or next sentence prediction.
|
- CANINE is primarily designed to be fine-tuned on a downstream task. The pretrained model can be used for either masked language modeling or next sentence prediction.
|
||||||
|
|
||||||
## CanineConfig
|
## CanineConfig
|
||||||
|
@ -28,7 +28,6 @@ rendered properly in your Markdown viewer.
|
|||||||
The Chameleon model was proposed in [Chameleon: Mixed-Modal Early-Fusion Foundation Models
|
The Chameleon model was proposed in [Chameleon: Mixed-Modal Early-Fusion Foundation Models
|
||||||
](https://huggingface.co/papers/2405.09818) by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response. Image generation module is not released yet.
|
](https://huggingface.co/papers/2405.09818) by META AI Chameleon Team. Chameleon is a Vision-Language Model that use vector quantization to tokenize images which enables the model to generate multimodal output. The model takes images and texts as input, including an interleaved format, and generates textual response. Image generation module is not released yet.
|
||||||
|
|
||||||
|
|
||||||
The abstract from the paper is the following:
|
The abstract from the paper is the following:
|
||||||
|
|
||||||
*We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training
|
*We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training
|
||||||
@ -43,7 +42,6 @@ including Gemini Pro and GPT-4V, according to human judgments on a new long-form
|
|||||||
generation evaluation, where either the prompt or outputs contain mixed sequences of both images and
|
generation evaluation, where either the prompt or outputs contain mixed sequences of both images and
|
||||||
text. Chameleon marks a significant step forward in unified modeling of full multimodal documents*
|
text. Chameleon marks a significant step forward in unified modeling of full multimodal documents*
|
||||||
|
|
||||||
|
|
||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png"
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png"
|
||||||
alt="drawing" width="600"/>
|
alt="drawing" width="600"/>
|
||||||
|
|
||||||
@ -52,7 +50,6 @@ alt="drawing" width="600"/>
|
|||||||
This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
|
This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
|
||||||
The original code can be found [here](https://github.com/facebookresearch/chameleon).
|
The original code can be found [here](https://github.com/facebookresearch/chameleon).
|
||||||
|
|
||||||
|
|
||||||
## Usage tips
|
## Usage tips
|
||||||
|
|
||||||
- We advise users to use `padding_side="left"` when computing batched generation as it leads to more accurate results. Simply make sure to set `processor.tokenizer.padding_side = "left"` before generating.
|
- We advise users to use `padding_side="left"` when computing batched generation as it leads to more accurate results. Simply make sure to set `processor.tokenizer.padding_side = "left"` before generating.
|
||||||
|
@ -29,11 +29,9 @@ The abstract from the paper is the following:
|
|||||||
|
|
||||||
*In recent years, the field of image generation has been revolutionized by the application of autoregressive transformers and DDPMs. These approaches model the process of image generation as a step-wise probabilistic processes and leverage large amounts of compute and data to learn the image distribution. This methodology of improving performance need not be confined to images. This paper describes a way to apply advances in the image generative domain to speech synthesis. The result is TorToise - an expressive, multi-voice text-to-speech system.*
|
*In recent years, the field of image generation has been revolutionized by the application of autoregressive transformers and DDPMs. These approaches model the process of image generation as a step-wise probabilistic processes and leverage large amounts of compute and data to learn the image distribution. This methodology of improving performance need not be confined to images. This paper describes a way to apply advances in the image generative domain to speech synthesis. The result is TorToise - an expressive, multi-voice text-to-speech system.*
|
||||||
|
|
||||||
|
|
||||||
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato).
|
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato).
|
||||||
The original code can be found [here](https://github.com/neonbjb/tortoise-tts).
|
The original code can be found [here](https://github.com/neonbjb/tortoise-tts).
|
||||||
|
|
||||||
|
|
||||||
## Usage tips
|
## Usage tips
|
||||||
|
|
||||||
1. CLVP is an integral part of the Tortoise TTS model.
|
1. CLVP is an integral part of the Tortoise TTS model.
|
||||||
@ -41,7 +39,6 @@ The original code can be found [here](https://github.com/neonbjb/tortoise-tts).
|
|||||||
3. The use of the [`ClvpModelForConditionalGeneration.generate()`] method is strongly recommended for tortoise usage.
|
3. The use of the [`ClvpModelForConditionalGeneration.generate()`] method is strongly recommended for tortoise usage.
|
||||||
4. Note that the CLVP model expects the audio to be sampled at 22.05 kHz contrary to other audio models which expects 16 kHz.
|
4. Note that the CLVP model expects the audio to be sampled at 22.05 kHz contrary to other audio models which expects 16 kHz.
|
||||||
|
|
||||||
|
|
||||||
## Brief Explanation:
|
## Brief Explanation:
|
||||||
|
|
||||||
- The [`ClvpTokenizer`] tokenizes the text input, and the [`ClvpFeatureExtractor`] extracts the log mel-spectrogram from the desired audio.
|
- The [`ClvpTokenizer`] tokenizes the text input, and the [`ClvpFeatureExtractor`] extracts the log mel-spectrogram from the desired audio.
|
||||||
@ -51,7 +48,6 @@ The original code can be found [here](https://github.com/neonbjb/tortoise-tts).
|
|||||||
- At the end, we compare each speech vector with the text vector to see which speech vector is most similar to the text vector.
|
- At the end, we compare each speech vector with the text vector to see which speech vector is most similar to the text vector.
|
||||||
- [`ClvpModelForConditionalGeneration.generate()`] compresses all of the logic described above into a single method.
|
- [`ClvpModelForConditionalGeneration.generate()`] compresses all of the logic described above into a single method.
|
||||||
|
|
||||||
|
|
||||||
Example :
|
Example :
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@ -74,7 +70,6 @@ Example :
|
|||||||
>>> generated_output = model.generate(**processor_output)
|
>>> generated_output = model.generate(**processor_output)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## ClvpConfig
|
## ClvpConfig
|
||||||
|
|
||||||
[[autodoc]] ClvpConfig
|
[[autodoc]] ClvpConfig
|
||||||
@ -128,4 +123,3 @@ Example :
|
|||||||
## ClvpDecoder
|
## ClvpDecoder
|
||||||
|
|
||||||
[[autodoc]] ClvpDecoder
|
[[autodoc]] ClvpDecoder
|
||||||
|
|
||||||
|
@ -143,6 +143,7 @@ visualizer("""def func(a, b):
|
|||||||
|
|
||||||
- Infilling is only available in the 7B and 13B base models, and not in the Python, Instruct, 34B, or 70B models.
|
- Infilling is only available in the 7B and 13B base models, and not in the Python, Instruct, 34B, or 70B models.
|
||||||
- Use the `<FILL_ME>` token where you want your input to be filled. The tokenizer splits this token to create a formatted input string that follows the [original training pattern](https://github.com/facebookresearch/codellama/blob/cb51c14ec761370ba2e2bc351374a79265d0465e/llama/generation.py#L402). This is more robust than preparing the pattern yourself.
|
- Use the `<FILL_ME>` token where you want your input to be filled. The tokenizer splits this token to create a formatted input string that follows the [original training pattern](https://github.com/facebookresearch/codellama/blob/cb51c14ec761370ba2e2bc351374a79265d0465e/llama/generation.py#L402). This is more robust than preparing the pattern yourself.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
from transformers import LlamaForCausalLM, CodeLlamaTokenizer
|
from transformers import LlamaForCausalLM, CodeLlamaTokenizer
|
||||||
|
|
||||||
@ -158,6 +159,7 @@ visualizer("""def func(a, b):
|
|||||||
filling = tokenizer.batch_decode(generated_ids[:, input_ids.shape[1]:], skip_special_tokens = True)[0]
|
filling = tokenizer.batch_decode(generated_ids[:, input_ids.shape[1]:], skip_special_tokens = True)[0]
|
||||||
print(PROMPT.replace("<FILL_ME>", filling))
|
print(PROMPT.replace("<FILL_ME>", filling))
|
||||||
```
|
```
|
||||||
|
|
||||||
- Use `bfloat16` for further training or fine-tuning and `float16` for inference.
|
- Use `bfloat16` for further training or fine-tuning and `float16` for inference.
|
||||||
- The `BOS` character is not used for infilling when encoding the prefix or suffix, but only at the beginning of each prompt.
|
- The `BOS` character is not used for infilling when encoding the prefix or suffix, but only at the beginning of each prompt.
|
||||||
- The tokenizer is a byte-pair encoding model based on [SentencePiece](https://github.com/google/sentencepiece). During decoding, if the first token is the start of the word (for example, “Banana”), the tokenizer doesn’t prepend the prefix space to the string.
|
- The tokenizer is a byte-pair encoding model based on [SentencePiece](https://github.com/google/sentencepiece). During decoding, if the first token is the start of the word (for example, “Banana”), the tokenizer doesn’t prepend the prefix space to the string.
|
||||||
|
@ -22,14 +22,12 @@ rendered properly in your Markdown viewer.
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
# Cohere
|
# Cohere
|
||||||
|
|
||||||
Cohere [Command-R](https://cohere.com/blog/command-r) is a 35B parameter multilingual large language model designed for long context tasks like retrieval-augmented generation (RAG) and calling external APIs and tools. The model is specifically trained for grounded generation and supports both single-step and multi-step tool use. It supports a context length of 128K tokens.
|
Cohere [Command-R](https://cohere.com/blog/command-r) is a 35B parameter multilingual large language model designed for long context tasks like retrieval-augmented generation (RAG) and calling external APIs and tools. The model is specifically trained for grounded generation and supports both single-step and multi-step tool use. It supports a context length of 128K tokens.
|
||||||
|
|
||||||
You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection.
|
You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection.
|
||||||
|
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> Click on the Cohere models in the right sidebar for more examples of how to apply Cohere to different language tasks.
|
> Click on the Cohere models in the right sidebar for more examples of how to apply Cohere to different language tasks.
|
||||||
|
|
||||||
@ -123,7 +121,6 @@ visualizer("Plants create energy through a process known as")
|
|||||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/cohere-attn-mask.png"/>
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/cohere-attn-mask.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
- Don’t use the dtype parameter in [`~AutoModel.from_pretrained`] if you’re using FlashAttention-2 because it only supports fp16 or bf16. You should use [Automatic Mixed Precision](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html), set fp16 or bf16 to True if using [`Trainer`], or use [torch.autocast](https://pytorch.org/docs/stable/amp.html#torch.autocast).
|
- Don’t use the dtype parameter in [`~AutoModel.from_pretrained`] if you’re using FlashAttention-2 because it only supports fp16 or bf16. You should use [Automatic Mixed Precision](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html), set fp16 or bf16 to True if using [`Trainer`], or use [torch.autocast](https://pytorch.org/docs/stable/amp.html#torch.autocast).
|
||||||
|
|
||||||
@ -145,7 +142,6 @@ visualizer("Plants create energy through a process known as")
|
|||||||
[[autodoc]] CohereModel
|
[[autodoc]] CohereModel
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## CohereForCausalLM
|
## CohereForCausalLM
|
||||||
|
|
||||||
[[autodoc]] CohereForCausalLM
|
[[autodoc]] CohereForCausalLM
|
||||||
|
@ -22,7 +22,6 @@ rendered properly in your Markdown viewer.
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
# Cohere 2
|
# Cohere 2
|
||||||
|
|
||||||
[Cohere Command R7B](https://cohere.com/blog/command-r7b) is an open weights research release of a 7B billion parameter model. It is a multilingual model trained on 23 languages and has a context window of 128k. The model features three layers with sliding window attention and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.
|
[Cohere Command R7B](https://cohere.com/blog/command-r7b) is an open weights research release of a 7B billion parameter model. It is a multilingual model trained on 23 languages and has a context window of 128k. The model features three layers with sliding window attention and ROPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.
|
||||||
@ -31,7 +30,6 @@ This model is optimized for speed, cost-performance, and compute resources.
|
|||||||
|
|
||||||
You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection.
|
You can find all the original Command-R checkpoints under the [Command Models](https://huggingface.co/collections/CohereForAI/command-models-67652b401665205e17b192ad) collection.
|
||||||
|
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> Click on the Cohere models in the right sidebar for more examples of how to apply Cohere to different language tasks.
|
> Click on the Cohere models in the right sidebar for more examples of how to apply Cohere to different language tasks.
|
||||||
|
|
||||||
@ -136,7 +134,6 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
|
|||||||
[[autodoc]] Cohere2Model
|
[[autodoc]] Cohere2Model
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## Cohere2ForCausalLM
|
## Cohere2ForCausalLM
|
||||||
|
|
||||||
[[autodoc]] Cohere2ForCausalLM
|
[[autodoc]] Cohere2ForCausalLM
|
||||||
|
@ -113,6 +113,7 @@ outputs = pipe(text=messages, max_new_tokens=300, return_full_text=False)
|
|||||||
|
|
||||||
print(outputs)
|
print(outputs)
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
|
@ -42,7 +42,6 @@ NLP tasks in the settings of few-shot (even zero-shot) learning.*
|
|||||||
This model was contributed by [canwenxu](https://huggingface.co/canwenxu). The original implementation can be found
|
This model was contributed by [canwenxu](https://huggingface.co/canwenxu). The original implementation can be found
|
||||||
here: https://github.com/TsinghuaAI/CPM-Generate
|
here: https://github.com/TsinghuaAI/CPM-Generate
|
||||||
|
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
CPM's architecture is the same as GPT-2, except for tokenization method. Refer to [GPT-2 documentation](gpt2) for
|
CPM's architecture is the same as GPT-2, except for tokenization method. Refer to [GPT-2 documentation](gpt2) for
|
||||||
@ -50,7 +49,6 @@ API reference information.
|
|||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|
||||||
## CpmTokenizer
|
## CpmTokenizer
|
||||||
|
|
||||||
[[autodoc]] CpmTokenizer
|
[[autodoc]] CpmTokenizer
|
||||||
|
@ -346,7 +346,6 @@ out.loss.backward()
|
|||||||
This model was contributed by [Eustache Le Bihan](https://huggingface.co/eustlb).
|
This model was contributed by [Eustache Le Bihan](https://huggingface.co/eustlb).
|
||||||
The original code can be found [here](https://github.com/SesameAILabs/csm).
|
The original code can be found [here](https://github.com/SesameAILabs/csm).
|
||||||
|
|
||||||
|
|
||||||
## CsmConfig
|
## CsmConfig
|
||||||
|
|
||||||
[[autodoc]] CsmConfig
|
[[autodoc]] CsmConfig
|
||||||
|
@ -55,7 +55,6 @@ This model was contributed by [keskarnitishr](https://huggingface.co/keskarnitis
|
|||||||
pre-computed values in the context of text generation. See the [`forward`](model_doc/ctrl#transformers.CTRLModel.forward)
|
pre-computed values in the context of text generation. See the [`forward`](model_doc/ctrl#transformers.CTRLModel.forward)
|
||||||
method for more information on the usage of this argument.
|
method for more information on the usage of this argument.
|
||||||
|
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
- [Text classification task guide](../tasks/sequence_classification)
|
- [Text classification task guide](../tasks/sequence_classification)
|
||||||
|
@ -77,7 +77,9 @@ for result in results:
|
|||||||
box = [round(i, 2) for i in box.tolist()]
|
box = [round(i, 2) for i in box.tolist()]
|
||||||
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
|
print(f"{model.config.id2label[label]}: {score:.2f} {box}")
|
||||||
```
|
```
|
||||||
|
|
||||||
This should output
|
This should output
|
||||||
|
|
||||||
```
|
```
|
||||||
cat: 0.87 [14.7, 49.39, 320.52, 469.28]
|
cat: 0.87 [14.7, 49.39, 320.52, 469.28]
|
||||||
remote: 0.86 [41.08, 72.37, 173.39, 117.2]
|
remote: 0.86 [41.08, 72.37, 173.39, 117.2]
|
||||||
@ -89,6 +91,7 @@ couch: 0.59 [-0.04, 1.34, 639.9, 477.09]
|
|||||||
There are three other ways to instantiate a DAB-DETR model (depending on what you prefer):
|
There are three other ways to instantiate a DAB-DETR model (depending on what you prefer):
|
||||||
|
|
||||||
Option 1: Instantiate DAB-DETR with pre-trained weights for entire model
|
Option 1: Instantiate DAB-DETR with pre-trained weights for entire model
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DabDetrForObjectDetection
|
>>> from transformers import DabDetrForObjectDetection
|
||||||
|
|
||||||
@ -96,19 +99,21 @@ Option 1: Instantiate DAB-DETR with pre-trained weights for entire model
|
|||||||
```
|
```
|
||||||
|
|
||||||
Option 2: Instantiate DAB-DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
|
Option 2: Instantiate DAB-DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DabDetrConfig, DabDetrForObjectDetection
|
>>> from transformers import DabDetrConfig, DabDetrForObjectDetection
|
||||||
|
|
||||||
>>> config = DabDetrConfig()
|
>>> config = DabDetrConfig()
|
||||||
>>> model = DabDetrForObjectDetection(config)
|
>>> model = DabDetrForObjectDetection(config)
|
||||||
```
|
```
|
||||||
|
|
||||||
Option 3: Instantiate DAB-DETR with randomly initialized weights for backbone + Transformer
|
Option 3: Instantiate DAB-DETR with randomly initialized weights for backbone + Transformer
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> config = DabDetrConfig(use_pretrained_backbone=False)
|
>>> config = DabDetrConfig(use_pretrained_backbone=False)
|
||||||
>>> model = DabDetrForObjectDetection(config)
|
>>> model = DabDetrForObjectDetection(config)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## DabDetrConfig
|
## DabDetrConfig
|
||||||
|
|
||||||
[[autodoc]] DabDetrConfig
|
[[autodoc]] DabDetrConfig
|
||||||
|
@ -23,7 +23,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
|
|
||||||
The DAC model was proposed in [Descript Audio Codec: High-Fidelity Audio Compression with Improved RVQGAN](https://huggingface.co/papers/2306.06546) by Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar.
|
The DAC model was proposed in [Descript Audio Codec: High-Fidelity Audio Compression with Improved RVQGAN](https://huggingface.co/papers/2306.06546) by Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar.
|
||||||
|
|
||||||
The Descript Audio Codec (DAC) model is a powerful tool for compressing audio data, making it highly efficient for storage and transmission. By compressing 44.1 KHz audio into tokens at just 8kbps bandwidth, the DAC model enables high-quality audio processing while significantly reducing the data footprint. This is particularly useful in scenarios where bandwidth is limited or storage space is at a premium, such as in streaming applications, remote conferencing, and archiving large audio datasets.
|
The Descript Audio Codec (DAC) model is a powerful tool for compressing audio data, making it highly efficient for storage and transmission. By compressing 44.1 KHz audio into tokens at just 8kbps bandwidth, the DAC model enables high-quality audio processing while significantly reducing the data footprint. This is particularly useful in scenarios where bandwidth is limited or storage space is at a premium, such as in streaming applications, remote conferencing, and archiving large audio datasets.
|
||||||
@ -35,7 +34,6 @@ The abstract from the paper is the following:
|
|||||||
This model was contributed by [Kamil Akesbi](https://huggingface.co/kamilakesbi).
|
This model was contributed by [Kamil Akesbi](https://huggingface.co/kamilakesbi).
|
||||||
The original code can be found [here](https://github.com/descriptinc/descript-audio-codec/tree/main?tab=readme-ov-file).
|
The original code can be found [here](https://github.com/descriptinc/descript-audio-codec/tree/main?tab=readme-ov-file).
|
||||||
|
|
||||||
|
|
||||||
## Model structure
|
## Model structure
|
||||||
|
|
||||||
The Descript Audio Codec (DAC) model is structured into three distinct stages:
|
The Descript Audio Codec (DAC) model is structured into three distinct stages:
|
||||||
|
@ -35,7 +35,6 @@ We estimate that this data is at least 2x better token-for-token than the data w
|
|||||||
This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, and Unity Catalog for data management and governance.
|
This new dataset was developed using the full suite of Databricks tools, including Apache Spark™ and Databricks notebooks for data processing, and Unity Catalog for data management and governance.
|
||||||
We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality.
|
We used curriculum learning for pretraining, changing the data mix during training in ways we found to substantially improve model quality.
|
||||||
|
|
||||||
|
|
||||||
More detailed information about DBRX Instruct and DBRX Base can be found in our [technical blog post](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm).
|
More detailed information about DBRX Instruct and DBRX Base can be found in our [technical blog post](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm).
|
||||||
|
|
||||||
This model was contributed by [eitan-turok](https://huggingface.co/eitanturok) and [abhi-db](https://huggingface.co/abhi-db). The original code can be found [here](https://github.com/databricks/dbrx-instruct), though this may not be up to date.
|
This model was contributed by [eitan-turok](https://huggingface.co/eitanturok) and [abhi-db](https://huggingface.co/abhi-db). The original code can be found [here](https://github.com/databricks/dbrx-instruct), though this may not be up to date.
|
||||||
@ -65,6 +64,7 @@ print(tokenizer.decode(outputs[0]))
|
|||||||
```
|
```
|
||||||
|
|
||||||
If you have flash-attention installed (`pip install flash-attn`), it is possible to generate faster. (The HuggingFace documentation for flash-attention can be found [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2).)
|
If you have flash-attention installed (`pip install flash-attn`), it is possible to generate faster. (The HuggingFace documentation for flash-attention can be found [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2).)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import DbrxForCausalLM, AutoTokenizer
|
from transformers import DbrxForCausalLM, AutoTokenizer
|
||||||
import torch
|
import torch
|
||||||
@ -87,6 +87,7 @@ print(tokenizer.decode(outputs[0]))
|
|||||||
```
|
```
|
||||||
|
|
||||||
You can also generate faster using the PyTorch scaled dot product attention. (The HuggingFace documentation for scaled dot product attention can be found [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#pytorch-scaled-dot-product-attention).)
|
You can also generate faster using the PyTorch scaled dot product attention. (The HuggingFace documentation for scaled dot product attention can be found [here](https://huggingface.co/docs/transformers/perf_infer_gpu_one#pytorch-scaled-dot-product-attention).)
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import DbrxForCausalLM, AutoTokenizer
|
from transformers import DbrxForCausalLM, AutoTokenizer
|
||||||
import torch
|
import torch
|
||||||
@ -112,15 +113,12 @@ print(tokenizer.decode(outputs[0]))
|
|||||||
|
|
||||||
[[autodoc]] DbrxConfig
|
[[autodoc]] DbrxConfig
|
||||||
|
|
||||||
|
|
||||||
## DbrxModel
|
## DbrxModel
|
||||||
|
|
||||||
[[autodoc]] DbrxModel
|
[[autodoc]] DbrxModel
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## DbrxForCausalLM
|
## DbrxForCausalLM
|
||||||
|
|
||||||
[[autodoc]] DbrxForCausalLM
|
[[autodoc]] DbrxForCausalLM
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
@ -21,14 +21,12 @@ rendered properly in your Markdown viewer.
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
# DeBERTa-v2
|
# DeBERTa-v2
|
||||||
|
|
||||||
[DeBERTa-v2](https://huggingface.co/papers/2006.03654) improves on the original [DeBERTa](./deberta) architecture by using a SentencePiece-based tokenizer and a new vocabulary size of 128K. It also adds an additional convolutional layer within the first transformer layer to better learn local dependencies of input tokens. Finally, the position projection and content projection matrices are shared in the attention layer to reduce the number of parameters.
|
[DeBERTa-v2](https://huggingface.co/papers/2006.03654) improves on the original [DeBERTa](./deberta) architecture by using a SentencePiece-based tokenizer and a new vocabulary size of 128K. It also adds an additional convolutional layer within the first transformer layer to better learn local dependencies of input tokens. Finally, the position projection and content projection matrices are shared in the attention layer to reduce the number of parameters.
|
||||||
|
|
||||||
You can find all the original [DeBERTa-v2] checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta-v2) organization.
|
You can find all the original [DeBERTa-v2] checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta-v2) organization.
|
||||||
|
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> This model was contributed by [Pengcheng He](https://huggingface.co/DeBERTa).
|
> This model was contributed by [Pengcheng He](https://huggingface.co/DeBERTa).
|
||||||
>
|
>
|
||||||
@ -86,6 +84,7 @@ print(f"Predicted label: {predicted_label}")
|
|||||||
```bash
|
```bash
|
||||||
echo -e "DeBERTa-v2 is great at understanding context!" | transformers run --task fill-mask --model microsoft/deberta-v2-xlarge-mnli --device 0
|
echo -e "DeBERTa-v2 is great at understanding context!" | transformers run --task fill-mask --model microsoft/deberta-v2-xlarge-mnli --device 0
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
@ -119,7 +118,6 @@ print(f"Predicted label: {predicted_label}")
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## DebertaV2Config
|
## DebertaV2Config
|
||||||
|
|
||||||
[[autodoc]] DebertaV2Config
|
[[autodoc]] DebertaV2Config
|
||||||
|
@ -31,7 +31,6 @@ Even with less training data than RoBERTa, DeBERTa manages to outperform it on s
|
|||||||
|
|
||||||
You can find all the original DeBERTa checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta) organization.
|
You can find all the original DeBERTa checkpoints under the [Microsoft](https://huggingface.co/microsoft?search_models=deberta) organization.
|
||||||
|
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> Click on the DeBERTa models in the right sidebar for more examples of how to apply DeBERTa to different language tasks.
|
> Click on the DeBERTa models in the right sidebar for more examples of how to apply DeBERTa to different language tasks.
|
||||||
|
|
||||||
|
@ -46,7 +46,6 @@ This model was contributed by [edbeeching](https://huggingface.co/edbeeching). T
|
|||||||
|
|
||||||
[[autodoc]] DecisionTransformerConfig
|
[[autodoc]] DecisionTransformerConfig
|
||||||
|
|
||||||
|
|
||||||
## DecisionTransformerGPT2Model
|
## DecisionTransformerGPT2Model
|
||||||
|
|
||||||
[[autodoc]] DecisionTransformerGPT2Model
|
[[autodoc]] DecisionTransformerGPT2Model
|
||||||
|
@ -61,6 +61,7 @@ outputs = model.generate(inputs, max_new_tokens=50)
|
|||||||
print(tokenizer.batch_decode(outputs))
|
print(tokenizer.batch_decode(outputs))
|
||||||
print(time.time()-start)
|
print(time.time()-start)
|
||||||
```
|
```
|
||||||
|
|
||||||
This generated:
|
This generated:
|
||||||
|
|
||||||
``````
|
``````
|
||||||
@ -157,18 +158,20 @@ Want to dive deeper or see a specific framework’s implementation (e.g., OpenAI
|
|||||||
``````
|
``````
|
||||||
|
|
||||||
Use the following to run it
|
Use the following to run it
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0|1 --rdzv-id an_id --rdzv-backend c10d --rdzv-endpoint master_addr:master_port run_deepseek_r1.py
|
torchrun --nproc_per_node=8 --nnodes=2 --node_rank=0|1 --rdzv-id an_id --rdzv-backend c10d --rdzv-endpoint master_addr:master_port run_deepseek_r1.py
|
||||||
```
|
```
|
||||||
|
|
||||||
If you have:
|
If you have:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
[rank0]: ncclInternalError: Internal check failed.
|
[rank0]: ncclInternalError: Internal check failed.
|
||||||
[rank0]: Last error:
|
[rank0]: Last error:
|
||||||
[rank0]: Bootstrap : no socket interface found
|
[rank0]: Bootstrap : no socket interface found
|
||||||
```
|
```
|
||||||
error, it means NCCL was probably not loaded.
|
|
||||||
|
|
||||||
|
error, it means NCCL was probably not loaded.
|
||||||
|
|
||||||
## DeepseekV3Config
|
## DeepseekV3Config
|
||||||
|
|
||||||
|
@ -63,6 +63,7 @@ messages = [
|
|||||||
|
|
||||||
pipe(text=messages, max_new_tokens=20, return_full_text=False)
|
pipe(text=messages, max_new_tokens=20, return_full_text=False)
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
|
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
@ -115,6 +116,7 @@ output_text = processor.batch_decode(
|
|||||||
|
|
||||||
print(output_text)
|
print(output_text)
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
@ -138,9 +140,11 @@ model = DeepseekVLForConditionalGeneration.from_pretrained(
|
|||||||
quantization_config=quantization_config
|
quantization_config=quantization_config
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Notes
|
### Notes
|
||||||
|
|
||||||
- Do inference with multiple images in a single conversation.
|
- Do inference with multiple images in a single conversation.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
import torch
|
import torch
|
||||||
from transformers import DeepseekVLForConditionalGeneration, AutoProcessor
|
from transformers import DeepseekVLForConditionalGeneration, AutoProcessor
|
||||||
|
@ -62,6 +62,7 @@ messages = [
|
|||||||
|
|
||||||
pipe(text=messages, max_new_tokens=20, return_full_text=False)
|
pipe(text=messages, max_new_tokens=20, return_full_text=False)
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
|
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
@ -114,6 +115,7 @@ output_text = processor.batch_decode(
|
|||||||
|
|
||||||
print(output_text)
|
print(output_text)
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
</hfoptions>
|
</hfoptions>
|
||||||
|
|
||||||
@ -137,9 +139,11 @@ model = DeepseekVLHybridForConditionalGeneration.from_pretrained(
|
|||||||
quantization_config=quantization_config
|
quantization_config=quantization_config
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Notes
|
### Notes
|
||||||
|
|
||||||
- Do inference with multiple images in a single conversation.
|
- Do inference with multiple images in a single conversation.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
import torch
|
import torch
|
||||||
from transformers import DeepseekVLHybridForConditionalGeneration, AutoProcessor
|
from transformers import DeepseekVLHybridForConditionalGeneration, AutoProcessor
|
||||||
|
@ -38,7 +38,6 @@ Currently one checkpoint is available for DePlot:
|
|||||||
|
|
||||||
- `google/deplot`: DePlot fine-tuned on ChartQA dataset
|
- `google/deplot`: DePlot fine-tuned on ChartQA dataset
|
||||||
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoProcessor, Pix2StructForConditionalGeneration
|
from transformers import AutoProcessor, Pix2StructForConditionalGeneration
|
||||||
import requests
|
import requests
|
||||||
@ -57,6 +56,7 @@ print(processor.decode(predictions[0], skip_special_tokens=True))
|
|||||||
## Fine-tuning
|
## Fine-tuning
|
||||||
|
|
||||||
To fine-tune DePlot, refer to the pix2struct [fine-tuning notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb). For `Pix2Struct` models, we have found out that fine-tuning the model with Adafactor and cosine learning rate scheduler leads to faster convergence:
|
To fine-tune DePlot, refer to the pix2struct [fine-tuning notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_pix2struct.ipynb). For `Pix2Struct` models, we have found out that fine-tuning the model with Adafactor and cosine learning rate scheduler leads to faster convergence:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup
|
from transformers.optimization import Adafactor, get_cosine_schedule_with_warmup
|
||||||
|
|
||||||
|
@ -102,12 +102,14 @@ The network is supplemented with a focal length estimation head. A small convolu
|
|||||||
The `use_fov_model` parameter in `DepthProConfig` controls whether **FOV prediction** is enabled. By default, it is set to `False` to conserve memory and computation. When enabled, the **FOV encoder** is instantiated based on the `fov_model_config` parameter, which defaults to a `Dinov2Model`. The `use_fov_model` parameter can also be passed when initializing the `DepthProForDepthEstimation` model.
|
The `use_fov_model` parameter in `DepthProConfig` controls whether **FOV prediction** is enabled. By default, it is set to `False` to conserve memory and computation. When enabled, the **FOV encoder** is instantiated based on the `fov_model_config` parameter, which defaults to a `Dinov2Model`. The `use_fov_model` parameter can also be passed when initializing the `DepthProForDepthEstimation` model.
|
||||||
|
|
||||||
The pretrained model at checkpoint `apple/DepthPro-hf` uses the FOV encoder. To use the pretrained-model without FOV encoder, set `use_fov_model=False` when loading the model, which saves computation.
|
The pretrained model at checkpoint `apple/DepthPro-hf` uses the FOV encoder. To use the pretrained-model without FOV encoder, set `use_fov_model=False` when loading the model, which saves computation.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DepthProForDepthEstimation
|
>>> from transformers import DepthProForDepthEstimation
|
||||||
>>> model = DepthProForDepthEstimation.from_pretrained("apple/DepthPro-hf", use_fov_model=False)
|
>>> model = DepthProForDepthEstimation.from_pretrained("apple/DepthPro-hf", use_fov_model=False)
|
||||||
```
|
```
|
||||||
|
|
||||||
To instantiate a new model with FOV encoder, set `use_fov_model=True` in the config.
|
To instantiate a new model with FOV encoder, set `use_fov_model=True` in the config.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DepthProConfig, DepthProForDepthEstimation
|
>>> from transformers import DepthProConfig, DepthProForDepthEstimation
|
||||||
>>> config = DepthProConfig(use_fov_model=True)
|
>>> config = DepthProConfig(use_fov_model=True)
|
||||||
@ -115,6 +117,7 @@ To instantiate a new model with FOV encoder, set `use_fov_model=True` in the con
|
|||||||
```
|
```
|
||||||
|
|
||||||
Or set `use_fov_model=True` when initializing the model, which overrides the value in config.
|
Or set `use_fov_model=True` when initializing the model, which overrides the value in config.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
>>> from transformers import DepthProConfig, DepthProForDepthEstimation
|
>>> from transformers import DepthProConfig, DepthProForDepthEstimation
|
||||||
>>> config = DepthProConfig()
|
>>> config = DepthProConfig()
|
||||||
|
@ -113,6 +113,7 @@ DETR can be naturally extended to perform panoptic segmentation (which unifies s
|
|||||||
There are three other ways to instantiate a DETR model (depending on what you prefer):
|
There are three other ways to instantiate a DETR model (depending on what you prefer):
|
||||||
|
|
||||||
- Option 1: Instantiate DETR with pre-trained weights for entire model
|
- Option 1: Instantiate DETR with pre-trained weights for entire model
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import DetrForObjectDetection
|
from transformers import DetrForObjectDetection
|
||||||
|
|
||||||
@ -120,6 +121,7 @@ model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50")
|
|||||||
```
|
```
|
||||||
|
|
||||||
- Option 2: Instantiate DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
|
- Option 2: Instantiate DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import DetrConfig, DetrForObjectDetection
|
from transformers import DetrConfig, DetrForObjectDetection
|
||||||
|
|
||||||
@ -128,6 +130,7 @@ model = DetrForObjectDetection(config)
|
|||||||
```
|
```
|
||||||
|
|
||||||
- Option 3: Instantiate DETR with randomly initialized weights for backbone + Transformer
|
- Option 3: Instantiate DETR with randomly initialized weights for backbone + Transformer
|
||||||
|
|
||||||
```python
|
```python
|
||||||
config = DetrConfig(use_pretrained_backbone=False)
|
config = DetrConfig(use_pretrained_backbone=False)
|
||||||
model = DetrForObjectDetection(config)
|
model = DetrForObjectDetection(config)
|
||||||
|
@ -117,11 +117,9 @@ out = model(**inputs)
|
|||||||
out.loss.backward()
|
out.loss.backward()
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
This model was contributed by [Jaeyong Sung](https://huggingface.co/buttercrab), [Arthur Zucker](https://huggingface.co/ArthurZ),
|
This model was contributed by [Jaeyong Sung](https://huggingface.co/buttercrab), [Arthur Zucker](https://huggingface.co/ArthurZ),
|
||||||
and [Anton Vlasjuk](https://huggingface.co/AntonV). The original code can be found [here](https://github.com/nari-labs/dia/).
|
and [Anton Vlasjuk](https://huggingface.co/AntonV). The original code can be found [here](https://github.com/nari-labs/dia/).
|
||||||
|
|
||||||
|
|
||||||
## DiaConfig
|
## DiaConfig
|
||||||
|
|
||||||
[[autodoc]] DiaConfig
|
[[autodoc]] DiaConfig
|
||||||
|
@ -35,7 +35,6 @@ The abstract from the paper is the following:
|
|||||||
### Usage tips
|
### Usage tips
|
||||||
The hyperparameters of this model is the same as Llama model.
|
The hyperparameters of this model is the same as Llama model.
|
||||||
|
|
||||||
|
|
||||||
## DiffLlamaConfig
|
## DiffLlamaConfig
|
||||||
|
|
||||||
[[autodoc]] DiffLlamaConfig
|
[[autodoc]] DiffLlamaConfig
|
||||||
|
@ -19,7 +19,6 @@ specific language governing permissions and limitations under the License.
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
# DINOv2
|
# DINOv2
|
||||||
|
|
||||||
[DINOv2](https://huggingface.co/papers/2304.07193) is a vision foundation model that uses [ViT](./vit) as a feature extractor for multiple downstream tasks like image classification and depth estimation. It focuses on stabilizing and accelerating training through techniques like a faster memory-efficient attention, sequence packing, improved stochastic depth, Fully Sharded Data Parallel (FSDP), and model distillation.
|
[DINOv2](https://huggingface.co/papers/2304.07193) is a vision foundation model that uses [ViT](./vit) as a feature extractor for multiple downstream tasks like image classification and depth estimation. It focuses on stabilizing and accelerating training through techniques like a faster memory-efficient attention, sequence packing, improved stochastic depth, Fully Sharded Data Parallel (FSDP), and model distillation.
|
||||||
|
@ -45,7 +45,6 @@ Tips:
|
|||||||
This model was contributed by [nielsr](https://huggingface.co/nielsr).
|
This model was contributed by [nielsr](https://huggingface.co/nielsr).
|
||||||
The original code can be found [here](https://github.com/facebookresearch/dinov2).
|
The original code can be found [here](https://github.com/facebookresearch/dinov2).
|
||||||
|
|
||||||
|
|
||||||
## Dinov2WithRegistersConfig
|
## Dinov2WithRegistersConfig
|
||||||
|
|
||||||
[[autodoc]] Dinov2WithRegistersConfig
|
[[autodoc]] Dinov2WithRegistersConfig
|
||||||
|
@ -19,7 +19,6 @@ specific language governing permissions and limitations under the License.
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
# DINOv3
|
# DINOv3
|
||||||
|
|
||||||
[DINOv3](https://huggingface.co/papers/2508.10104) is a family of versatile vision foundation models that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models.
|
[DINOv3](https://huggingface.co/papers/2508.10104) is a family of versatile vision foundation models that outperforms the specialized state of the art across a broad range of settings, without fine-tuning. DINOv3 produces high-quality dense features that achieve outstanding performance on various vision tasks, significantly surpassing previous self- and weakly-supervised foundation models.
|
||||||
|
@ -85,6 +85,7 @@ print(f"The predicted class label is: {predicted_class_label}")
|
|||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- The pretrained DiT weights can be loaded in a [BEiT] model with a modeling head to predict visual tokens.
|
- The pretrained DiT weights can be loaded in a [BEiT] model with a modeling head to predict visual tokens.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
from transformers import BeitForMaskedImageModeling
|
from transformers import BeitForMaskedImageModeling
|
||||||
|
|
||||||
|
@ -17,7 +17,6 @@ rendered properly in your Markdown viewer.
|
|||||||
|
|
||||||
# Doge
|
# Doge
|
||||||
|
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Doge is a series of small language models based on the [Doge](https://github.com/SmallDoges/small-doge) architecture, aiming to combine the advantages of state-space and self-attention algorithms, calculate dynamic masks from cached value states using the zero-order hold method, and solve the problem of existing mainstream language models getting lost in context. It uses the `wsd_scheduler` scheduler to pre-train on the `smollm-corpus`, and can continue training on new datasets or add sparse activation feedforward networks from stable stage checkpoints.
|
Doge is a series of small language models based on the [Doge](https://github.com/SmallDoges/small-doge) architecture, aiming to combine the advantages of state-space and self-attention algorithms, calculate dynamic masks from cached value states using the zero-order hold method, and solve the problem of existing mainstream language models getting lost in context. It uses the `wsd_scheduler` scheduler to pre-train on the `smollm-corpus`, and can continue training on new datasets or add sparse activation feedforward networks from stable stage checkpoints.
|
||||||
@ -28,7 +27,6 @@ As shown in the figure below, the sequence transformation part of the Doge archi
|
|||||||
|
|
||||||
Checkout all Doge model checkpoints [here](https://huggingface.co/collections/SmallDoge/doge-slm-679cc991f027c4a3abbded4a).
|
Checkout all Doge model checkpoints [here](https://huggingface.co/collections/SmallDoge/doge-slm-679cc991f027c4a3abbded4a).
|
||||||
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
@ -44,6 +42,7 @@ inputs = tokenizer("Hey how are you doing?", return_tensors="pt")
|
|||||||
outputs = model.generate(**inputs, max_new_tokens=100)
|
outputs = model.generate(**inputs, max_new_tokens=100)
|
||||||
print(tokenizer.batch_decode(outputs))
|
print(tokenizer.batch_decode(outputs))
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
@ -82,6 +81,7 @@ outputs = model.generate(
|
|||||||
streamer=steamer
|
streamer=steamer
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
## DogeConfig
|
## DogeConfig
|
||||||
|
@ -25,7 +25,6 @@ The abstract from the report is the following:
|
|||||||
|
|
||||||
*Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.*
|
*Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.*
|
||||||
|
|
||||||
|
|
||||||
## Dots1Config
|
## Dots1Config
|
||||||
|
|
||||||
[[autodoc]] Dots1Config
|
[[autodoc]] Dots1Config
|
||||||
|
@ -45,6 +45,7 @@ results = keypoint_matcher([url_0, url_1], threshold=0.9)
|
|||||||
print(results[0])
|
print(results[0])
|
||||||
# {'keypoint_image_0': {'x': ..., 'y': ...}, 'keypoint_image_1': {'x': ..., 'y': ...}, 'score': ...}
|
# {'keypoint_image_0': {'x': ..., 'y': ...}, 'keypoint_image_1': {'x': ..., 'y': ...}, 'score': ...}
|
||||||
```
|
```
|
||||||
|
|
||||||
</hfoption>
|
</hfoption>
|
||||||
<hfoption id="AutoModel">
|
<hfoption id="AutoModel">
|
||||||
|
|
||||||
@ -167,4 +168,3 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
|
|||||||
[[autodoc]] EfficientLoFTRForKeypointMatching
|
[[autodoc]] EfficientLoFTRForKeypointMatching
|
||||||
|
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
@ -34,7 +34,6 @@ To go even further, we use neural architecture search to design a new baseline n
|
|||||||
This model was contributed by [adirik](https://huggingface.co/adirik).
|
This model was contributed by [adirik](https://huggingface.co/adirik).
|
||||||
The original code can be found [here](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
|
The original code can be found [here](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
|
||||||
|
|
||||||
|
|
||||||
## EfficientNetConfig
|
## EfficientNetConfig
|
||||||
|
|
||||||
[[autodoc]] EfficientNetConfig
|
[[autodoc]] EfficientNetConfig
|
||||||
@ -58,4 +57,3 @@ The original code can be found [here](https://github.com/tensorflow/tpu/tree/mas
|
|||||||
|
|
||||||
[[autodoc]] EfficientNetForImageClassification
|
[[autodoc]] EfficientNetForImageClassification
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
@ -29,7 +29,6 @@ The Emu3 model was proposed in [Emu3: Next-Token Prediction is All You Need](htt
|
|||||||
|
|
||||||
Emu3 is a multimodal LLM that uses vector quantization to tokenize images into discrete tokens. Discretized image tokens are later fused with text token ids for image and text generation. The model can additionally generate images by predicting image token ids.
|
Emu3 is a multimodal LLM that uses vector quantization to tokenize images into discrete tokens. Discretized image tokens are later fused with text token ids for image and text generation. The model can additionally generate images by predicting image token ids.
|
||||||
|
|
||||||
|
|
||||||
The abstract from the paper is the following:
|
The abstract from the paper is the following:
|
||||||
|
|
||||||
*While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this paper, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA-1.6, while eliminating the need for diffusion or compositional architectures. Emu3 is also capable of generating high-fidelity video via predicting the next token in a video sequence. We simplify complex multimodal model designs by converging on a singular focus: tokens, unlocking great potential for scaling both during training and inference. Our results demonstrate that next-token prediction is a promising path towards building general multimodal intelligence beyond language. We open-source key techniques and models to support further research in this direction.*
|
*While next-token prediction is considered a promising path towards artificial general intelligence, it has struggled to excel in multimodal tasks, which are still dominated by diffusion models (e.g., Stable Diffusion) and compositional approaches (e.g., CLIP combined with LLMs). In this paper, we introduce Emu3, a new suite of state-of-the-art multimodal models trained solely with next-token prediction. By tokenizing images, text, and videos into a discrete space, we train a single transformer from scratch on a mixture of multimodal sequences. Emu3 outperforms several well-established task-specific models in both generation and perception tasks, surpassing flagship models such as SDXL and LLaVA-1.6, while eliminating the need for diffusion or compositional architectures. Emu3 is also capable of generating high-fidelity video via predicting the next token in a video sequence. We simplify complex multimodal model designs by converging on a singular focus: tokens, unlocking great potential for scaling both during training and inference. Our results demonstrate that next-token prediction is a promising path towards building general multimodal intelligence beyond language. We open-source key techniques and models to support further research in this direction.*
|
||||||
@ -45,11 +44,9 @@ Tips:
|
|||||||
> [!TIP]
|
> [!TIP]
|
||||||
> Emu3 implementation in Transformers uses a special image token to indicate where to merge image embeddings. The special image token isn't new and uses one of the reserved tokens: `<|extra_0|>`. You have to add `<image>` to your prompt in the place where the image should be embedded for correct generation.
|
> Emu3 implementation in Transformers uses a special image token to indicate where to merge image embeddings. The special image token isn't new and uses one of the reserved tokens: `<|extra_0|>`. You have to add `<image>` to your prompt in the place where the image should be embedded for correct generation.
|
||||||
|
|
||||||
|
|
||||||
This model was contributed by [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
|
This model was contributed by [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
|
||||||
The original code can be found [here](https://github.com/baaivision/Emu3).
|
The original code can be found [here](https://github.com/baaivision/Emu3).
|
||||||
|
|
||||||
|
|
||||||
## Usage example
|
## Usage example
|
||||||
|
|
||||||
### Text generation inference
|
### Text generation inference
|
||||||
@ -143,7 +140,6 @@ for i, image in enumerate(images['pixel_values']):
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## Emu3Config
|
## Emu3Config
|
||||||
|
|
||||||
[[autodoc]] Emu3Config
|
[[autodoc]] Emu3Config
|
||||||
|
@ -39,7 +39,6 @@ Architecturally, EoMT introduces a small set of **learned queries** and a lightw
|
|||||||
alt="drawing" width="500"/>
|
alt="drawing" width="500"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
The model supports semantic, instance, and panoptic segmentation using a unified architecture and task-specific post-processing.
|
The model supports semantic, instance, and panoptic segmentation using a unified architecture and task-specific post-processing.
|
||||||
|
|
||||||
## Usage Examples
|
## Usage Examples
|
||||||
|
@ -38,7 +38,6 @@ Other models from the family can be found at [Ernie 4.5 Moe](./ernie4_5_moe).
|
|||||||
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Usage Tips
|
## Usage Tips
|
||||||
|
|
||||||
### Generate text
|
### Generate text
|
||||||
@ -84,7 +83,6 @@ generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
|||||||
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
||||||
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
||||||
|
|
||||||
|
|
||||||
## Ernie4_5Config
|
## Ernie4_5Config
|
||||||
|
|
||||||
[[autodoc]] Ernie4_5Config
|
[[autodoc]] Ernie4_5Config
|
||||||
|
@ -40,7 +40,6 @@ Other models from the family can be found at [Ernie 4.5](./ernie4_5).
|
|||||||
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||
## Usage Tips
|
## Usage Tips
|
||||||
|
|
||||||
### Generate text
|
### Generate text
|
||||||
@ -167,7 +166,6 @@ generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
|||||||
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
||||||
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
||||||
|
|
||||||
|
|
||||||
## Ernie4_5_MoeConfig
|
## Ernie4_5_MoeConfig
|
||||||
|
|
||||||
[[autodoc]] Ernie4_5_MoeConfig
|
[[autodoc]] Ernie4_5_MoeConfig
|
||||||
|
@ -40,7 +40,6 @@ The abstract from the paper is the following:
|
|||||||
*Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for lowresource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks.*
|
*Recent studies have demonstrated that pre-trained cross-lingual models achieve impressive performance in downstream cross-lingual tasks. This improvement benefits from learning a large amount of monolingual and parallel corpora. Although it is generally acknowledged that parallel corpora are critical for improving the model performance, existing methods are often constrained by the size of parallel corpora, especially for lowresource languages. In this paper, we propose ERNIE-M, a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. Our key insight is to integrate back-translation into the pre-training process. We generate pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks.*
|
||||||
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato). The original code can be found [here](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/ernie_m).
|
This model was contributed by [Susnato Dhar](https://huggingface.co/susnato). The original code can be found [here](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/ernie_m).
|
||||||
|
|
||||||
|
|
||||||
## Usage tips
|
## Usage tips
|
||||||
|
|
||||||
- Ernie-M is a BERT-like model so it is a stacked Transformer Encoder.
|
- Ernie-M is a BERT-like model so it is a stacked Transformer Encoder.
|
||||||
@ -59,7 +58,6 @@ This model was contributed by [Susnato Dhar](https://huggingface.co/susnato). Th
|
|||||||
|
|
||||||
[[autodoc]] ErnieMConfig
|
[[autodoc]] ErnieMConfig
|
||||||
|
|
||||||
|
|
||||||
## ErnieMTokenizer
|
## ErnieMTokenizer
|
||||||
|
|
||||||
[[autodoc]] ErnieMTokenizer
|
[[autodoc]] ErnieMTokenizer
|
||||||
@ -68,7 +66,6 @@ This model was contributed by [Susnato Dhar](https://huggingface.co/susnato). Th
|
|||||||
- create_token_type_ids_from_sequences
|
- create_token_type_ids_from_sequences
|
||||||
- save_vocabulary
|
- save_vocabulary
|
||||||
|
|
||||||
|
|
||||||
## ErnieMModel
|
## ErnieMModel
|
||||||
|
|
||||||
[[autodoc]] ErnieMModel
|
[[autodoc]] ErnieMModel
|
||||||
@ -79,19 +76,16 @@ This model was contributed by [Susnato Dhar](https://huggingface.co/susnato). Th
|
|||||||
[[autodoc]] ErnieMForSequenceClassification
|
[[autodoc]] ErnieMForSequenceClassification
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## ErnieMForMultipleChoice
|
## ErnieMForMultipleChoice
|
||||||
|
|
||||||
[[autodoc]] ErnieMForMultipleChoice
|
[[autodoc]] ErnieMForMultipleChoice
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## ErnieMForTokenClassification
|
## ErnieMForTokenClassification
|
||||||
|
|
||||||
[[autodoc]] ErnieMForTokenClassification
|
[[autodoc]] ErnieMForTokenClassification
|
||||||
- forward
|
- forward
|
||||||
|
|
||||||
|
|
||||||
## ErnieMForQuestionAnswering
|
## ErnieMForQuestionAnswering
|
||||||
|
|
||||||
[[autodoc]] ErnieMForQuestionAnswering
|
[[autodoc]] ErnieMForQuestionAnswering
|
||||||
|
@ -44,12 +44,10 @@ sequence alignment (MSA) step at inference time, which means that ESMFold checkp
|
|||||||
they do not require a database of known protein sequences and structures with associated external query tools
|
they do not require a database of known protein sequences and structures with associated external query tools
|
||||||
to make predictions, and are much faster as a result.
|
to make predictions, and are much faster as a result.
|
||||||
|
|
||||||
|
|
||||||
The abstract from
|
The abstract from
|
||||||
"Biological structure and function emerge from scaling unsupervised learning to 250
|
"Biological structure and function emerge from scaling unsupervised learning to 250
|
||||||
million protein sequences" is
|
million protein sequences" is
|
||||||
|
|
||||||
|
|
||||||
*In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised
|
*In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised
|
||||||
learning has led to major advances in representation learning and statistical generation. In the life sciences, the
|
learning has led to major advances in representation learning and statistical generation. In the life sciences, the
|
||||||
anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling
|
anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling
|
||||||
@ -63,7 +61,6 @@ can be identified by linear projections. Representation learning produces featur
|
|||||||
applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and
|
applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and
|
||||||
improving state-of-the-art features for long-range contact prediction.*
|
improving state-of-the-art features for long-range contact prediction.*
|
||||||
|
|
||||||
|
|
||||||
The abstract from
|
The abstract from
|
||||||
"Language models of protein sequences at the scale of evolution enable accurate structure prediction" is
|
"Language models of protein sequences at the scale of evolution enable accurate structure prediction" is
|
||||||
|
|
||||||
|
@ -75,7 +75,6 @@ Tips:
|
|||||||
- This model was contributed by [Xibin Bayes Zhou](https://huggingface.co/XibinBayesZhou).
|
- This model was contributed by [Xibin Bayes Zhou](https://huggingface.co/XibinBayesZhou).
|
||||||
- The original code can be found [here](https://github.com/westlake-repl/Evolla).
|
- The original code can be found [here](https://github.com/westlake-repl/Evolla).
|
||||||
|
|
||||||
|
|
||||||
## EvollaConfig
|
## EvollaConfig
|
||||||
|
|
||||||
[[autodoc]] EvollaConfig
|
[[autodoc]] EvollaConfig
|
||||||
|
@ -33,7 +33,6 @@ For more details, please refer to our [technical report](https://huggingface.co/
|
|||||||
|
|
||||||
All model weights including quantized versions are available at [Huggingface Collections](https://huggingface.co/collections/LGAI-EXAONE/exaone-40-686b2e0069800c835ed48375).
|
All model weights including quantized versions are available at [Huggingface Collections](https://huggingface.co/collections/LGAI-EXAONE/exaone-40-686b2e0069800c835ed48375).
|
||||||
|
|
||||||
|
|
||||||
## Model Details
|
## Model Details
|
||||||
|
|
||||||
### Model Specifications
|
### Model Specifications
|
||||||
@ -57,7 +56,6 @@ All model weights including quantized versions are available at [Huggingface Col
|
|||||||
| Tied word embedding | False | True |
|
| Tied word embedding | False | True |
|
||||||
| Knowledge cut-off | Nov. 2024 | Nov. 2024 |
|
| Knowledge cut-off | Nov. 2024 | Nov. 2024 |
|
||||||
|
|
||||||
|
|
||||||
## Usage tips
|
## Usage tips
|
||||||
|
|
||||||
### Non-reasoning mode
|
### Non-reasoning mode
|
||||||
|
@ -21,7 +21,6 @@ The [FalconH1](https://huggingface.co/blog/tiiuae/falcon-h1) model was developed
|
|||||||
This model was contributed by [DhiyaEddine](https://huggingface.co/DhiyaEddine), [ybelkada](https://huggingface.co/ybelkada), [JingweiZuo](https://huggingface.co/JingweiZuo), [IlyasChahed](https://huggingface.co/IChahed), and [MaksimVelikanov](https://huggingface.co/yellowvm).
|
This model was contributed by [DhiyaEddine](https://huggingface.co/DhiyaEddine), [ybelkada](https://huggingface.co/ybelkada), [JingweiZuo](https://huggingface.co/JingweiZuo), [IlyasChahed](https://huggingface.co/IChahed), and [MaksimVelikanov](https://huggingface.co/yellowvm).
|
||||||
The original code can be found [here](https://github.com/tiiuae/Falcon-H1).
|
The original code can be found [here](https://github.com/tiiuae/Falcon-H1).
|
||||||
|
|
||||||
|
|
||||||
## FalconH1Config
|
## FalconH1Config
|
||||||
|
|
||||||
| Model | Depth | Dim | Attn Heads | KV | Mamba Heads | d_head | d_state | Ctx Len |
|
| Model | Depth | Dim | Attn Heads | KV | Mamba Heads | d_head | d_state | Ctx Len |
|
||||||
@ -33,8 +32,6 @@ The original code can be found [here](https://github.com/tiiuae/Falcon-H1).
|
|||||||
| H1 7B | 44 | 3072 | 12 | 2 | 24 | 128 / 128 | 256 | 256K |
|
| H1 7B | 44 | 3072 | 12 | 2 | 24 | 128 / 128 | 256 | 256K |
|
||||||
| H1 34B | 72 | 5120 | 20 | 4 | 32 | 128 / 128 | 256 | 256K |
|
| H1 34B | 72 | 5120 | 20 | 4 | 32 | 128 / 128 | 256 | 256K |
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
[[autodoc]] FalconH1Config
|
[[autodoc]] FalconH1Config
|
||||||
|
|
||||||
<!---
|
<!---
|
||||||
|
@ -27,7 +27,6 @@ The abstract from the original FastSpeech2 paper is the following:
|
|||||||
|
|
||||||
This model was contributed by [Connor Henderson](https://huggingface.co/connor-henderson). The original code can be found [here](https://github.com/espnet/espnet/blob/master/espnet2/tts/fastspeech2/fastspeech2.py).
|
This model was contributed by [Connor Henderson](https://huggingface.co/connor-henderson). The original code can be found [here](https://github.com/espnet/espnet/blob/master/espnet2/tts/fastspeech2/fastspeech2.py).
|
||||||
|
|
||||||
|
|
||||||
## 🤗 Model Architecture
|
## 🤗 Model Architecture
|
||||||
FastSpeech2's general structure with a Mel-spectrogram decoder was implemented, and the traditional transformer blocks were replaced with conformer blocks as done in the ESPnet library.
|
FastSpeech2's general structure with a Mel-spectrogram decoder was implemented, and the traditional transformer blocks were replaced with conformer blocks as done in the ESPnet library.
|
||||||
|
|
||||||
@ -90,6 +89,7 @@ sf.write("speech.wav", waveform.squeeze().detach().numpy(), samplerate=22050)
|
|||||||
```
|
```
|
||||||
|
|
||||||
4. Run inference with a pipeline and specify which vocoder to use
|
4. Run inference with a pipeline and specify which vocoder to use
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import pipeline, FastSpeech2ConformerHifiGan
|
from transformers import pipeline, FastSpeech2ConformerHifiGan
|
||||||
import soundfile as sf
|
import soundfile as sf
|
||||||
@ -102,7 +102,6 @@ speech = synthesiser("Hello, my dog is cooler than you!")
|
|||||||
sf.write("speech.wav", speech["audio"].squeeze(), samplerate=speech["sampling_rate"])
|
sf.write("speech.wav", speech["audio"].squeeze(), samplerate=speech["sampling_rate"])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## FastSpeech2ConformerConfig
|
## FastSpeech2ConformerConfig
|
||||||
|
|
||||||
[[autodoc]] FastSpeech2ConformerConfig
|
[[autodoc]] FastSpeech2ConformerConfig
|
||||||
|
@ -35,7 +35,6 @@ Google has released the following variants:
|
|||||||
|
|
||||||
The original checkpoints can be found [here](https://github.com/google-research/google-research/tree/master/ul2).
|
The original checkpoints can be found [here](https://github.com/google-research/google-research/tree/master/ul2).
|
||||||
|
|
||||||
|
|
||||||
## Running on low resource devices
|
## Running on low resource devices
|
||||||
|
|
||||||
The model is pretty heavy (~40GB in half precision) so if you just want to run the model, make sure you load your model in 8bit, and use `device_map="auto"` to make sure you don't have any OOM issue!
|
The model is pretty heavy (~40GB in half precision) so if you just want to run the model, make sure you load your model in 8bit, and use `device_map="auto"` to make sure you don't have any OOM issue!
|
||||||
|
@ -12,7 +12,6 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|||||||
See the License for the specific language governing permissions and
|
See the License for the specific language governing permissions and
|
||||||
limitations under the License.
|
limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
|
||||||
|
|
||||||
-->
|
-->
|
||||||
@ -90,6 +89,7 @@ echo -e "Plants create energy through a process known as" | transformers run --t
|
|||||||
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
|
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
|
||||||
|
|
||||||
The example below uses [torchao](../quantization/torchao) to only quantize the weights to 4-bits.
|
The example below uses [torchao](../quantization/torchao) to only quantize the weights to 4-bits.
|
||||||
|
|
||||||
```py
|
```py
|
||||||
|
|
||||||
#pip install torchao
|
#pip install torchao
|
||||||
@ -119,7 +119,6 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## FlexOlmoConfig
|
## FlexOlmoConfig
|
||||||
|
|
||||||
[[autodoc]] FlexOlmoConfig
|
[[autodoc]] FlexOlmoConfig
|
||||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user