!209 新增dpo和reward文档,修复transformers导入问题
Merge pull request !209 from 幽若/master-doc-0513
This commit is contained in:
@ -564,8 +564,8 @@ chmod -R 750 path/to/model_weights
|
||||
**参数列表**
|
||||
|
||||
| **参数名** | **描述** | **类型** | **默认值** | 是否可选 |
|
||||
|-----------------------|-----------------------|--------|---------|---------|
|
||||
| stage | 训练阶段。可选: pt, sft。 | str | sft | 可选 |
|
||||
|-----------------------|------------------------------------------------------|--------|---------|---------|
|
||||
| stage | 训练阶段。可选: pt, sft, rm, dpo。 | str | sft | 可选 |
|
||||
| finetuning_type | 微调方式。可选: full, lora。 | str | full | 可选 |
|
||||
| lora_target_modules | 采取LoRA方法的目标模块。 | str | None | 可选 |
|
||||
| lora_alpha | LoRA微调的缩放因子。 | int | None | 可选 |
|
||||
|
@ -11,7 +11,7 @@ dataset: alpaca_zh_51k
|
||||
当前内置数据集列表如下,持续更新中:
|
||||
|
||||
| **dataset** | **魔乐社区数据仓** | **HuggingFace社区数据仓** | **数据类型** |
|
||||
|---------------------|-----------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|----------|
|
||||
|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|----------|
|
||||
| alpaca_zh_51k | [AI-Research/alpaca_zh_51k](https://modelers.cn/datasets/AI-Research/alpaca_zh_51k) | [hfl/alpaca_zh_51k](https://huggingface.co/datasets/hfl/alpaca_zh_51k) | Alpaca |
|
||||
| alpaca | [AI_Connect/alpaca](https://modelers.cn/datasets/AI_Connect/alpaca) | [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) | Alpaca |
|
||||
| alpaca_eval | [AI-Research/alpaca_eval](https://modelers.cn/datasets/AI-Research/alpaca_eval) | [tatsu-lab/alpaca_eval](https://huggingface.co/datasets/tatsu-lab/alpaca_eval) | Alpaca |
|
||||
@ -21,6 +21,7 @@ dataset: alpaca_zh_51k
|
||||
| Sky-T1_data_17k | [AI-Research/Sky-T1_data_17k](https://modelers.cn/datasets/AI-Research/Sky-T1_data_17k) | [NovaSky-AI/Sky-T1_data_17k](https://huggingface.co/datasets/NovaSky-AI/Sky-T1_data_17k) | ShareGPT |
|
||||
| text_zh_data | [AI-Research/text_zh_data](https://modelers.cn/datasets/AI-Research/text_zh_data) | / | Text |
|
||||
| OpenR1-Math-220k_filtered_step3_SFT | [openmind/OpenR1-Math-220k_filtered_step3_SFT](https://modelers.cn/datasets/openmind/OpenR1-Math-220k_filtered_step3_SFT) | / | Text |
|
||||
| rlhf-reward-datasets | [PyTorch-NPU/rlhf-reward-datasets](https://modelers.cn/datasets/PyTorch-NPU/rlhf-reward-datasets) | [rlhf-reward-datasets](https://huggingface.co/datasets/yitingxie/rlhf-reward-datasets) | pairwise |
|
||||
|
||||
## 非内置数据集
|
||||
|
||||
@ -35,6 +36,8 @@ openMind目前支持Alpaca、ShareGPT和Text三种数据格式,自定义数据
|
||||
<th>数据集格式</th>
|
||||
<th>PT</th>
|
||||
<th>SFT</th>
|
||||
<th>RM</th>
|
||||
<th>DPO</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
@ -42,16 +45,29 @@ openMind目前支持Alpaca、ShareGPT和Text三种数据格式,自定义数据
|
||||
<td>Alpaca</td>
|
||||
<td>❌</td>
|
||||
<td>✅</td>
|
||||
<td>❌</td>
|
||||
<td>❌</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ShareGPT</td>
|
||||
<td>❌</td>
|
||||
<td>✅</td>
|
||||
<td>❌</td>
|
||||
<td>❌</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Text</td>
|
||||
<td>✅</td>
|
||||
<td>❌</td>
|
||||
<td>❌</td>
|
||||
<td>❌</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Pairwise</td>
|
||||
<td>❌</td>
|
||||
<td>❌</td>
|
||||
<td>✅</td>
|
||||
<td>✅</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
@ -219,9 +235,50 @@ Text数据集格式要求如下:
|
||||
]
|
||||
```
|
||||
|
||||
#### Pairwise数据集
|
||||
|
||||
Pairwise数据集格式要求如下:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"prompt": "prompt content",
|
||||
"chosen": "chosen response",
|
||||
"rejected": "rejected response"
|
||||
},
|
||||
{
|
||||
"prompt": "prompt content",
|
||||
"chosen": "chosen response",
|
||||
"rejected": "rejected response"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
其中,
|
||||
|
||||
* `prompt`为用户指令或者问题,必须项
|
||||
* `chosen`和`rejected`为被选择与被拒绝的response内容, 二者均为必须项
|
||||
|
||||
Pairwise格式示例数据如下:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"prompt": "Human: I want to grow a fruit indoor this year, can you give me a suggestion on an easy fruit to stary with?",
|
||||
"chosen": "Assistant: Sure, what’s your definition of “indoor”?",
|
||||
"rejected": "Assistant: Which fruit are you thinking of?"
|
||||
},
|
||||
{
|
||||
"prompt": "Human: I have heartburn sometimes. Can you recommend a way to reduce it?",
|
||||
"chosen": "Assistant: Are you currently experiencing heartburn, or do you sometimes get it?",
|
||||
"rejected": "Assistant: What kinds of things trigger heartburn for you?"
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### 数据集配置文件
|
||||
|
||||
在数据集符合Alpaca、ShareGPT和Text格式之后,您可以直接在`dataset`参数中传入数据集绝对路径,或者本地创建`custom_dataset_info.json`文件配置数据集相关信息。
|
||||
在数据集符合Alpaca、ShareGPT、Text和Pairwise格式之后,您可以直接在`dataset`参数中传入数据集绝对路径,或者本地创建`custom_dataset_info.json`文件配置数据集相关信息。
|
||||
|
||||
#### Alpaca数据集配置模板
|
||||
|
||||
@ -292,11 +349,34 @@ Text数据集格式要求如下:
|
||||
}
|
||||
```
|
||||
|
||||
#### Pairwise数据集配置模板
|
||||
|
||||
对于Pairwise数据集,在配置文件中的描述应为:
|
||||
|
||||
```json
|
||||
{
|
||||
"dataset": {
|
||||
"local_path(必填)": "xxx",
|
||||
"file_name(选填)": "dataset.json",
|
||||
"split(选填)": "train",
|
||||
"num_samples(选填)": xxx,
|
||||
"formatting(必填)": "pairwise",
|
||||
"columns": {
|
||||
"prompt": "prompt(值取决于数据集中的列名或键名,下同)",
|
||||
"chosen": "chosen",
|
||||
"rejected": "rejected"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
> **备注**
|
||||
>
|
||||
> <font size=3>1.对于ShareGPT和Text数据集,必须含有formatting字段。</font>
|
||||
> <font size=3>1.对于ShareGPT、Text和Pairwise数据集,必须含有formatting字段。</font>
|
||||
>
|
||||
> <font size=3>2.对于Text数据集,若每条数据的key为"text",则无需添加columns字段</font>
|
||||
>
|
||||
> <font size=3>3.对于Pairwise数据集,prompt、chosen和rejected为必须列,当原始数据集列名不同时,需自行处理映射到这三列。</font>
|
||||
|
||||
### yaml文件配置
|
||||
|
||||
|
127
docs/zh/basic_tutorial/train/posttrain/finetune/finetune_dpo.md
Normal file
127
docs/zh/basic_tutorial/train/posttrain/finetune/finetune_dpo.md
Normal file
@ -0,0 +1,127 @@
|
||||
# PyTorch模型DPO训练
|
||||
|
||||
DPO (Direct Preference Optimization) 是一种用于对齐大型语言模型 (LLM) 的训练方法,使其输出更符合人类偏好。它是对 RLHF (Reinforcement Learning from Human Feedback) 流程的一种简化和改进。DPO 的核心思想是:直接利用人类偏好数据来优化语言模型,而无需显式地训练一个奖励模型,也无需使用复杂的强化学习算法。
|
||||
openMind Library当前已支持DPO训练,用户可通过如下步骤启动DPO训练。
|
||||
|
||||
## 环境准备
|
||||
|
||||
openMind Library命令行接口内置于openMind Library中,安装openMind Library即可使用,详细步骤参考[openMind Library安装指南](../../../../install.md)。
|
||||
|
||||
*`注:openMind进行dpo训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
||||
|
||||
## 模型微调示例
|
||||
|
||||
openMind Library通过解析yaml文件的方式拉起微调训练。用户需要配置一个微调相关的yaml文件,然后通过`openmind-cli train`命令行方式运行,openMind Library会自动完成参数解析和微调流程配置运行。以下为一个可运行的示例`dpo_demo.yaml`。
|
||||
|
||||
```yaml
|
||||
# model
|
||||
model_id: Qwen2.5-7B
|
||||
# model_name_or_path: /path/to/Qwen2.5-7B
|
||||
|
||||
# method
|
||||
stage: dpo
|
||||
do_train: true
|
||||
finetuning_type: lora
|
||||
|
||||
# finetuning_type为full则不需要配置lora_rank和lora_alpha
|
||||
lora_rank: 8
|
||||
lora_alpha: 16
|
||||
|
||||
# dataset
|
||||
dataset: rlhf-reward-datasets
|
||||
cutoff_len: 1024
|
||||
|
||||
# output
|
||||
output_dir: saves/qwen2_7b_reward
|
||||
logging_steps: 1
|
||||
save_steps: 20000
|
||||
overwrite_output_dir: true
|
||||
|
||||
# train
|
||||
per_device_train_batch_size: 2
|
||||
gradient_accumulation_steps: 1
|
||||
learning_rate: 1.0e-5
|
||||
lr_scheduler_type: cosine
|
||||
warmup_ratio: 0.1
|
||||
bf16: true
|
||||
max_steps: 5000
|
||||
seed: 1234
|
||||
```
|
||||
|
||||
运行命令为:
|
||||
|
||||
```shell
|
||||
openmind-cli train demo.yaml
|
||||
```
|
||||
|
||||
yaml文件内的配置包括微调算法参数,模型参数,数据集参数和训练参数。详细参数请见[训练参数](../../train_params.md)。
|
||||
|
||||
我们也为您提供了SDK接口,您可以在openMind Library里直接调用`run_train`函数,通过python文件的方式启动微调流程,如下为`train_demo.py`的示例:
|
||||
|
||||
```python
|
||||
from openmind import run_train
|
||||
|
||||
run_train(
|
||||
model_name_or_path = "/mnt/h/pretrain_models/Qwen2.5-0.5B/",
|
||||
stage="rm",
|
||||
template="qwen",
|
||||
do_train=True,
|
||||
finetuning_type="lora",
|
||||
# finetuning_type为full则不需要传lora_rank和lora_alpha
|
||||
lora_rank=8,
|
||||
lora_alpha=16,
|
||||
dataset="rlhf-reward-datasets",
|
||||
output_dir="saves/qwen2.5_0.5b_lora_rm",
|
||||
logging_steps=1,
|
||||
save_steps=20000,
|
||||
overwrite_output_dir=True,
|
||||
per_device_train_batch_size=2,
|
||||
gradient_accumulation_steps=1,
|
||||
learning_rate=1.0e-5,
|
||||
bf16=True,
|
||||
max_steps=10,
|
||||
seed=1234,
|
||||
)
|
||||
```
|
||||
|
||||
## 模型微调SDK
|
||||
|
||||
您可以选择单机单卡启动微调,也可以选择单机多卡协调,以下为启动命令示例:
|
||||
|
||||
```shell
|
||||
|
||||
#单机单卡
|
||||
python train_demo.py
|
||||
|
||||
#单机八卡
|
||||
torchrun --nnodes 1 --node_rank 0 --nproc_per_node 8 train_demo.py
|
||||
|
||||
#限定Ascend NPU单机多卡
|
||||
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes 1 --node_rank 0 train_demo.py
|
||||
```
|
||||
|
||||
## 多机模型微调
|
||||
|
||||
openMind Library微调支持多机多卡。以下为双机多卡运行步骤示例。
|
||||
|
||||
- 确定双机环境配置完全且有效,您可参考[多机多卡场景文档](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/PT_LMTMOG_0022.html)进行双机配置。
|
||||
- 在双机环境配置相同的驱动固件,CANN,python环境依赖和yaml运行文件。
|
||||
- 在双机分别设置以下环境变量,需要注意的是,`MASTER_ADDR`在双机上都必须设置为主节点IP,且`MASTER_PORT`保持一致。
|
||||
|
||||
```shell
|
||||
#主节点
|
||||
export MASTER_ADDR = XX.XX.XX.XXX
|
||||
export MASTER_PORT = XXXX
|
||||
export NNODES = 2
|
||||
export RANK = 0
|
||||
|
||||
#副节点
|
||||
export MASTER_ADDR = XX.XX.XX.XXX
|
||||
export MASTER_PORT = XXXX
|
||||
export NNODES = 2
|
||||
export RANK = 1
|
||||
```
|
||||
|
||||
- 双机上分别启动`openmind-cli train example.yaml`命令
|
||||
|
||||
同时我们也为您提供了[openMind微调教程](https://modelers.cn/spaces/openmind/openmind_finetune),您可以结合体验空间内的notebook示例,进一步学习理解微调。
|
@ -1,4 +1,4 @@
|
||||
# PyTorch模型微调
|
||||
# PyTorch模型sft微调
|
||||
|
||||
## 环境准备
|
||||
|
||||
@ -6,7 +6,7 @@ openMind Library命令行接口内置于openMind Library中,安装openMind Lib
|
||||
|
||||
## 模型微调示例
|
||||
|
||||
openMind Library通过解析yaml文件的方式拉起微调训练。用户需要配置一个微调相关的yaml文件,然后通过`openmind-cli train`命令行方式运行,openMind Library会自动完成参数解析和微调流程配置运行。以下为一个可运行的示例`demo.yaml`。
|
||||
openMind Library通过解析yaml文件的方式拉起微调训练。用户需要配置一个微调相关的yaml文件,然后通过`openmind-cli train`命令行方式运行,openMind Library会自动完成参数解析和微调流程配置运行。以下为一个可运行的示例`sft_demo.yaml`。
|
||||
|
||||
```yaml
|
||||
# model
|
||||
|
121
docs/zh/basic_tutorial/train/posttrain/finetune/finetune_rm.md
Normal file
121
docs/zh/basic_tutorial/train/posttrain/finetune/finetune_rm.md
Normal file
@ -0,0 +1,121 @@
|
||||
# PyTorch模型reward训练
|
||||
|
||||
Reward模型训练(Reward Modeling)是强化学习(Reinforcement Learning, RL)中的一种关键技术,尤其在基于人类反馈的强化学习(RLHF, Reinforcement Learning from Human Feedback)中被广泛应用。其核心目标是通过训练一个能够模拟人类偏好的模型(即Reward Model),为强化学习提供可量化的“奖励信号”,从而指导AI模型生成更符合人类期望的输出。
|
||||
|
||||
openMind Library当前已支持reward训练,用户可通过如下步骤启动reward训练。
|
||||
|
||||
## 环境准备
|
||||
|
||||
openMind Library命令行接口内置于openMind Library中,安装openMind Library即可使用,详细步骤参考[openMind Library安装指南](../../../../install.md)。
|
||||
|
||||
*`注:openMind进行dpo训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
||||
|
||||
## 模型微调示例
|
||||
|
||||
openMind Library通过解析yaml文件的方式拉起微调训练。用户需要配置一个微调相关的yaml文件,然后通过`openmind-cli train`命令行方式运行,openMind Library会自动完成参数解析和微调流程配置运行。以下为一个可运行的示例`rm_demo.yaml`。
|
||||
|
||||
```yaml
|
||||
# model
|
||||
model_id: Qwen2.5-7B
|
||||
# model_name_or_path: /path/to/Qwen2.5-7B
|
||||
|
||||
# method
|
||||
stage: rm
|
||||
do_train: true
|
||||
finetuning_type: lora
|
||||
|
||||
# dataset
|
||||
dataset: rlhf-reward-datasets
|
||||
cutoff_len: 1024
|
||||
|
||||
# output
|
||||
output_dir: saves/qwen2_7b_reward
|
||||
logging_steps: 1
|
||||
save_steps: 20000
|
||||
overwrite_output_dir: true
|
||||
|
||||
# train
|
||||
per_device_train_batch_size: 2
|
||||
gradient_accumulation_steps: 1
|
||||
learning_rate: 1.0e-5
|
||||
lr_scheduler_type: cosine
|
||||
warmup_ratio: 0.1
|
||||
bf16: true
|
||||
max_steps: 5000
|
||||
seed: 1234
|
||||
```
|
||||
|
||||
运行命令为:
|
||||
|
||||
```shell
|
||||
openmind-cli train demo.yaml
|
||||
```
|
||||
|
||||
yaml文件内的配置包括微调算法参数,模型参数,数据集参数和训练参数。详细参数请见[训练参数](../../train_params.md)。
|
||||
|
||||
我们也为您提供了SDK接口,您可以在openMind Library里直接调用`run_train`函数,通过python文件的方式启动微调流程,如下为`train_demo.py`的示例:
|
||||
|
||||
```python
|
||||
from openmind import run_train
|
||||
|
||||
run_train(
|
||||
model_name_or_path = "/mnt/h/pretrain_models/Qwen2.5-0.5B/",
|
||||
stage="rm",
|
||||
template="qwen",
|
||||
do_train=True,
|
||||
finetuning_type="lora",
|
||||
dataset="rlhf-reward-datasets",
|
||||
output_dir="saves/qwen2.5_0.5b_lora_rm",
|
||||
logging_steps=1,
|
||||
save_steps=20000,
|
||||
overwrite_output_dir=True,
|
||||
per_device_train_batch_size=2,
|
||||
gradient_accumulation_steps=1,
|
||||
learning_rate=1.0e-5,
|
||||
bf16=True,
|
||||
max_steps=10,
|
||||
seed=1234,
|
||||
)
|
||||
```
|
||||
|
||||
## 模型微调SDK
|
||||
|
||||
您可以选择单机单卡启动微调,也可以选择单机多卡协调,以下为启动命令示例:
|
||||
|
||||
```shell
|
||||
|
||||
#单机单卡
|
||||
python train_demo.py
|
||||
|
||||
#单机八卡
|
||||
torchrun --nnodes 1 --node_rank 0 --nproc_per_node 8 train_demo.py
|
||||
|
||||
#限定Ascend NPU单机多卡
|
||||
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 torchrun --nnodes 1 --node_rank 0 train_demo.py
|
||||
```
|
||||
|
||||
## 多机模型微调
|
||||
|
||||
openMind Library微调支持多机多卡。以下为双机多卡运行步骤示例。
|
||||
|
||||
- 确定双机环境配置完全且有效,您可参考[多机多卡场景文档](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/PT_LMTMOG_0022.html)进行双机配置。
|
||||
- 在双机环境配置相同的驱动固件,CANN,python环境依赖和yaml运行文件。
|
||||
- 在双机分别设置以下环境变量,需要注意的是,`MASTER_ADDR`在双机上都必须设置为主节点IP,且`MASTER_PORT`保持一致。
|
||||
|
||||
```shell
|
||||
#主节点
|
||||
export MASTER_ADDR = XX.XX.XX.XXX
|
||||
export MASTER_PORT = XXXX
|
||||
export NNODES = 2
|
||||
export RANK = 0
|
||||
|
||||
#副节点
|
||||
export MASTER_ADDR = XX.XX.XX.XXX
|
||||
export MASTER_PORT = XXXX
|
||||
export NNODES = 2
|
||||
export RANK = 1
|
||||
```
|
||||
|
||||
- 双机上分别启动`openmind-cli train example.yaml`命令
|
||||
|
||||
同时我们也为您提供了[openMind微调教程](https://modelers.cn/spaces/openmind/openmind_finetune),您可以结合体验空间内的notebook示例,进一步学习理解微调。
|
@ -293,8 +293,8 @@ export HUB_WHITE_LIST_PATHS=/home/cache_model
|
||||
## 训练方法
|
||||
|
||||
| **参数名** | **描述** | **类型** | **默认值** | 是否可选 |
|
||||
|-----------------------|-----------------------|--------|---------|---------|
|
||||
| stage | 训练阶段,目前支持pt和sft。 | str | sft | 可选 |
|
||||
|-----------------------|-------------------------------------------------------|--------|---------|---------|
|
||||
| stage | 训练阶段,目前支持pt、sft、rm和dpo。 | str | sft | 可选 |
|
||||
| finetuning_type | 训练方式。可选: full, lora。 | str | full | 可选 |
|
||||
| lora_target_modules | 采取LoRA方法的目标模块。 | str | None | 可选 |
|
||||
| lora_alpha | Lora训练的缩放因子。 | int | None | 可选 |
|
||||
|
@ -33,7 +33,7 @@ if is_torch_available():
|
||||
else:
|
||||
from mindformers.trainer.utils import get_last_checkpoint
|
||||
|
||||
if is_trl_available() and is_torch_available():
|
||||
if is_torch_available() and is_transformers_available() and is_trl_available():
|
||||
from trl.trainer.dpo_config import DPOConfig
|
||||
from trl.trainer.reward_config import RewardConfig
|
||||
|
||||
|
Reference in New Issue
Block a user