@ -29,7 +29,7 @@ dataset: alpaca_zh_51k
|
||||
|
||||
### 数据处理
|
||||
|
||||
openMind目前支持Alpaca、ShareGPT和Text三种数据格式,自定义数据集需要转换为这三种格式之一。各格式支持的训练阶段如下:
|
||||
openMind目前支持Alpaca、ShareGPT、Text和Pairwise四种数据格式,自定义数据集需要转换为这四种格式之一。各格式支持的训练阶段如下:
|
||||
<table>
|
||||
<thead>
|
||||
<tr>
|
||||
|
@ -32,7 +32,7 @@ dataset: rlhf-reward-datasets
|
||||
cutoff_len: 1024
|
||||
|
||||
# output
|
||||
output_dir: saves/qwen2_7b_reward
|
||||
output_dir: saves/qwen2_7b_dpo
|
||||
logging_steps: 1
|
||||
save_steps: 20000
|
||||
overwrite_output_dir: true
|
||||
@ -63,7 +63,7 @@ from openmind import run_train
|
||||
|
||||
run_train(
|
||||
model_name_or_path = "/mnt/h/pretrain_models/Qwen2.5-0.5B/",
|
||||
stage="rm",
|
||||
stage="dpo",
|
||||
template="qwen",
|
||||
do_train=True,
|
||||
finetuning_type="lora",
|
||||
@ -71,7 +71,7 @@ run_train(
|
||||
lora_rank=8,
|
||||
lora_alpha=16,
|
||||
dataset="rlhf-reward-datasets",
|
||||
output_dir="saves/qwen2.5_0.5b_lora_rm",
|
||||
output_dir="saves/qwen2.5_0.5b_lora_dpo",
|
||||
logging_steps=1,
|
||||
save_steps=20000,
|
||||
overwrite_output_dir=True,
|
||||
|
@ -8,7 +8,7 @@ openMind Library当前已支持reward训练,用户可通过如下步骤启动r
|
||||
|
||||
openMind Library命令行接口内置于openMind Library中,安装openMind Library即可使用,详细步骤参考[openMind Library安装指南](../../../../install.md)。
|
||||
|
||||
*`注:openMind进行dpo训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
||||
*`注:openMind进行reward训练依赖trl>=0.16.1,datasets >= 2.18.0, <= 2.21.0,openMind和trl两者存在datasets版本依赖冲突,请在安装完trl后手动安装datasets对应版本。`*
|
||||
|
||||
## 模型微调示例
|
||||
|
||||
|
Reference in New Issue
Block a user