# PyTorch模型预训练

## 环境准备

openMind Library命令行接口内置于openMind Library中，安装openMind Library即可使用，详细步骤参考[openMind Library安装指南](../../install.md)。

## 模型预训练示例

openMind Library通过解析yaml文件的方式拉起预训练。用户需要配置一个预训练相关的yaml文件，然后通过`openmind-cli train`命令行方式运行，openMind Library会自动完成参数解析和预训练流程配置运行。以下为一个可运行的示例`demo.yaml`。

```yaml
# model
model_id: Qwen2.5-7B

# method
stage: pt
do_train: true
finetuning_type: full

# dataset
dataset: text_zh_data
cutoff_len: 1024

# output
output_dir: saves/qwen2.5_7b_full
logging_steps: 1
save_steps: 20000
overwrite_output_dir: true

# train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 1.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
max_steps: 5000
seed: 1234
```

运行命令为：

```shell
openmind-cli train demo.yaml
```

yaml文件内的配置包括微调算法参数，模型参数，数据集参数和训练参数，详细参数请见[训练参数](./train_params.md)。