mirror of
https://github.com/hiyouga/LLaMA-Factory.git
synced 2025-10-20 12:54:18 +08:00
[model] support audio (#6701)
* support qwen2_audio * improve code * lint * fix * fix * fix --------- Co-authored-by: hiyouga <hiyouga@buaa.edu.cn> Former-commit-id: 5eacb5629e4d7733cd992a63747a1335f2c6a929
This commit is contained in:
@ -24,6 +24,7 @@ Currently we support datasets in **alpaca** and **sharegpt** format.
|
||||
"tools": "the column name in the dataset containing the tool description. (default: None)",
|
||||
"images": "the column name in the dataset containing the image inputs. (default: None)",
|
||||
"videos": "the column name in the dataset containing the videos inputs. (default: None)",
|
||||
"audios": "the column name in the dataset containing the audios inputs. (default: None)",
|
||||
"chosen": "the column name in the dataset containing the chosen answers. (default: None)",
|
||||
"rejected": "the column name in the dataset containing the rejected answers. (default: None)",
|
||||
"kto_tag": "the column name in the dataset containing the kto tags. (default: None)"
|
||||
@ -150,6 +151,10 @@ An additional column `images` is required. Please refer to the [sharegpt](#share
|
||||
|
||||
An additional column `videos` is required. Please refer to the [sharegpt](#sharegpt-format) format for details.
|
||||
|
||||
### Multimodal Audio Dataset
|
||||
|
||||
An additional column `audios` is required. Please refer to the [sharegpt](#sharegpt-format) format for details.
|
||||
|
||||
## Sharegpt Format
|
||||
|
||||
### Supervised Fine-Tuning Dataset
|
||||
@ -296,7 +301,7 @@ Regarding the above dataset, the *dataset description* in `dataset_info.json` sh
|
||||
|
||||
- [Example dataset](mllm_demo.json)
|
||||
|
||||
Multimodal image datasets require a `images` column containing the paths to the input images.
|
||||
Multimodal image datasets require an `images` column containing the paths to the input images.
|
||||
|
||||
The number of images should be identical to the `<image>` tokens in the conversations.
|
||||
|
||||
@ -374,6 +379,47 @@ Regarding the above dataset, the *dataset description* in `dataset_info.json` sh
|
||||
}
|
||||
```
|
||||
|
||||
### Multimodal Audio Dataset
|
||||
|
||||
- [Example dataset](mllm_audio_demo.json)
|
||||
|
||||
Multimodal audio datasets require an `audios` column containing the paths to the input audios.
|
||||
|
||||
The number of audios should be identical to the `<audio>` tokens in the conversations.
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"conversations": [
|
||||
{
|
||||
"from": "human",
|
||||
"value": "<audio>human instruction"
|
||||
},
|
||||
{
|
||||
"from": "gpt",
|
||||
"value": "model response"
|
||||
}
|
||||
],
|
||||
"audios": [
|
||||
"audio path (required)"
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
Regarding the above dataset, the *dataset description* in `dataset_info.json` should be:
|
||||
|
||||
```json
|
||||
"dataset_name": {
|
||||
"file_name": "data.json",
|
||||
"formatting": "sharegpt",
|
||||
"columns": {
|
||||
"messages": "conversations",
|
||||
"audios": "audios"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### OpenAI Format
|
||||
|
||||
The openai format is simply a special case of the sharegpt format, where the first message may be a system prompt.
|
||||
|
@ -24,6 +24,7 @@
|
||||
"tools": "数据集代表工具描述的表头名称(默认:None)",
|
||||
"images": "数据集代表图像输入的表头名称(默认:None)",
|
||||
"videos": "数据集代表视频输入的表头名称(默认:None)",
|
||||
"audios": "数据集代表音频输入的表头名称(默认:None)",
|
||||
"chosen": "数据集代表更优回答的表头名称(默认:None)",
|
||||
"rejected": "数据集代表更差回答的表头名称(默认:None)",
|
||||
"kto_tag": "数据集代表 KTO 标签的表头名称(默认:None)"
|
||||
@ -150,6 +151,10 @@ KTO 数据集需要提供额外的 `kto_tag` 列。详情请参阅 [sharegpt](#s
|
||||
|
||||
多模态视频数据集需要提供额外的 `videos` 列。详情请参阅 [sharegpt](#sharegpt-格式)。
|
||||
|
||||
### 多模态音频数据集
|
||||
|
||||
多模态音频数据集需要提供额外的 `audios` 列。详情请参阅 [sharegpt](#sharegpt-格式)。
|
||||
|
||||
## Sharegpt 格式
|
||||
|
||||
### 指令监督微调数据集
|
||||
@ -374,6 +379,48 @@ KTO 数据集需要额外添加一个 `kto_tag` 列,包含 bool 类型的人
|
||||
}
|
||||
```
|
||||
|
||||
### 多模态音频数据集
|
||||
|
||||
- [样例数据集](mllm_audio_demo.json)
|
||||
|
||||
多模态音频数据集需要额外添加一个 `audios` 列,包含输入音频的路径。
|
||||
|
||||
注意音频的数量必须与文本中所有 `<audio>` 标记的数量严格一致。
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"conversations": [
|
||||
{
|
||||
"from": "human",
|
||||
"value": "<audio>人类指令"
|
||||
},
|
||||
{
|
||||
"from": "gpt",
|
||||
"value": "模型回答"
|
||||
}
|
||||
],
|
||||
"audios": [
|
||||
"音频路径(必填)"
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
对于上述格式的数据,`dataset_info.json` 中的*数据集描述*应为:
|
||||
|
||||
```json
|
||||
"数据集名称": {
|
||||
"file_name": "data.json",
|
||||
"formatting": "sharegpt",
|
||||
"columns": {
|
||||
"messages": "conversations",
|
||||
"audios": "audios"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### OpenAI 格式
|
||||
|
||||
OpenAI 格式仅仅是 sharegpt 格式的一种特殊情况,其中第一条消息可能是系统提示词。
|
||||
|
BIN
data/mllm_demo_data/1.mp3
Normal file
BIN
data/mllm_demo_data/1.mp3
Normal file
Binary file not shown.
BIN
data/mllm_demo_data/2.wav
Normal file
BIN
data/mllm_demo_data/2.wav
Normal file
Binary file not shown.
BIN
data/mllm_demo_data/3.flac
Normal file
BIN
data/mllm_demo_data/3.flac
Normal file
Binary file not shown.
Reference in New Issue
Block a user