@ -29,7 +29,7 @@ dataset: alpaca_zh_51k
|
|||||||
|
|
||||||
### 数据处理
|
### 数据处理
|
||||||
|
|
||||||
openMind目前支持Alpaca、ShareGPT和Text三种数据格式,自定义数据集需要转换为这三种格式之一。各格式支持的训练阶段如下:
|
openMind目前支持Alpaca、ShareGPT、Text和Pairwise四种数据格式,自定义数据集需要转换为这四种格式之一。各格式支持的训练阶段如下:
|
||||||
<table>
|
<table>
|
||||||
<thead>
|
<thead>
|
||||||
<tr>
|
<tr>
|
||||||
|
@ -32,7 +32,7 @@ dataset: rlhf-reward-datasets
|
|||||||
cutoff_len: 1024
|
cutoff_len: 1024
|
||||||
|
|
||||||
# output
|
# output
|
||||||
output_dir: saves/qwen2_7b_reward
|
output_dir: saves/qwen2_7b_dpo
|
||||||
logging_steps: 1
|
logging_steps: 1
|
||||||
save_steps: 20000
|
save_steps: 20000
|
||||||
overwrite_output_dir: true
|
overwrite_output_dir: true
|
||||||
@ -63,7 +63,7 @@ from openmind import run_train
|
|||||||
|
|
||||||
run_train(
|
run_train(
|
||||||
model_name_or_path = "/mnt/h/pretrain_models/Qwen2.5-0.5B/",
|
model_name_or_path = "/mnt/h/pretrain_models/Qwen2.5-0.5B/",
|
||||||
stage="rm",
|
stage="dpo",
|
||||||
template="qwen",
|
template="qwen",
|
||||||
do_train=True,
|
do_train=True,
|
||||||
finetuning_type="lora",
|
finetuning_type="lora",
|
||||||
@ -71,7 +71,7 @@ run_train(
|
|||||||
lora_rank=8,
|
lora_rank=8,
|
||||||
lora_alpha=16,
|
lora_alpha=16,
|
||||||
dataset="rlhf-reward-datasets",
|
dataset="rlhf-reward-datasets",
|
||||||
output_dir="saves/qwen2.5_0.5b_lora_rm",
|
output_dir="saves/qwen2.5_0.5b_lora_dpo",
|
||||||
logging_steps=1,
|
logging_steps=1,
|
||||||
save_steps=20000,
|
save_steps=20000,
|
||||||
overwrite_output_dir=True,
|
overwrite_output_dir=True,
|
||||||
|
@ -8,7 +8,7 @@ openMind Library当前已支持reward训练,用户可通过如下步骤启动r
|
|||||||
|
|
||||||
openMind Library命令行接口内置于openMind Library中,安装openMind Library即可使用,详细步骤参考[openMind Library安装指南](../../../../install.md)。
|
openMind Library命令行接口内置于openMind Library中,安装openMind Library即可使用,详细步骤参考[openMind Library安装指南](../../../../install.md)。
|
||||||
|
|
||||||
*`注:openMind进行dpo训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
*`注:openMind进行reward训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
||||||
|
|
||||||
## 模型微调示例
|
## 模型微调示例
|
||||||
|
|
||||||
|
Reference in New Issue
Block a user