@ -1,124 +0,0 @@
|
||||
# PreTrainer Module APIs
|
||||
|
||||
## openmind.PreTrainer Class
|
||||
|
||||
The `PreTrainer` class provides common functions for pre-training process management.
|
||||
|
||||
**Parameters**
|
||||
|
||||
| Parameter | Type | Description | Default Value |
|
||||
| ---------------- | ------------------------------------------- |---------------|------|
|
||||
| pretrain_args | PreTrainingArguments | Pre-training parameter | - |
|
||||
| accelerator | Accelerator | Accelerate instance| None |
|
||||
| model | torch.nn.Module | Torch model | None |
|
||||
| optimizer | accelerate.utils.MegatronLMOptimizerWrapper | Optimizer | None |
|
||||
| lr_scheduler | accelerate.utils.MegatronLMSchedulerWrapper | Scheduler | None |
|
||||
| train_dataloader | torch.utils.data.DataLoader | Training data loader | None |
|
||||
| eval_dataloader | torch.utils.data.DataLoader | Evaluation data loader | None |
|
||||
|
||||
### train
|
||||
|
||||
Starts pre-training.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def train()
|
||||
```
|
||||
|
||||
## openmind.PreTrainingArguments Class
|
||||
|
||||
The `PreTrainingArguments` class configures parameters of a training job, including hyperparameters required during training, model save path, and learning rate.
|
||||
|
||||
**Parameters**
|
||||
|
||||
| Parameter | Type| Description | Default Value for PyTorch |
|
||||
| --------------------------- | ---- |-------------------|-----------------------|
|
||||
| num_training_steps | int | Number of training steps | - |
|
||||
| micro_batch_size | int | Size of a micro batch | - |
|
||||
| dp | int | Degree of parallelism | - |
|
||||
| gradient_accumulation_steps | int | Number of gradient accumulation steps | 1 |
|
||||
| seq_length | int | Maximum length of a sequence | None |
|
||||
| megatron_dataset_flag | bool | Whether the dataset is Magatron-formatted| None |
|
||||
| data_path | str | Dataset path | None |
|
||||
| save_dir | str | Model saving path | None |
|
||||
| save_interval | int | Model saving interval | None |
|
||||
| eval_interval | int | Model evaluation interval | None |
|
||||
| openmind_model_path | str | Model path | None |
|
||||
| dtype | str | Runtime data type | bf16 |
|
||||
| plugin_args | dict | [Accelerate plugin parameter](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | None |
|
||||
| dataloader_config | dict | [Loader configuration parameter](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | None |
|
||||
| report_to | str | Accelerate log output object| None |
|
||||
| project_name | str | Project name | "accelerate-megatron" |
|
||||
|
||||
### from_yaml
|
||||
|
||||
Loads configurations from the YAML configuration file.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def from_yaml(config_path: str)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
| Parameter | Description | Supported Type|
|
||||
| ----------- |-------------| -------- |
|
||||
| config_path | Path of the YAML configuration file| str |
|
||||
|
||||
### get_mixed_precision
|
||||
|
||||
Obtains the mixed precision type.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def get_mixed_precision()
|
||||
```
|
||||
|
||||
### get_torch_dtype
|
||||
|
||||
Obtains the runtime data type.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def get_torch_dtype()
|
||||
```
|
||||
|
||||
### get_distributed_train_args
|
||||
|
||||
Obtains distributed pre-training parameters.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def get_distributed_train_args()
|
||||
```
|
||||
|
||||
### update_distributed_train_args
|
||||
|
||||
Updates distributed pre-training parameters.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def update_distributed_train_args(extra_args: dict)
|
||||
```
|
||||
|
||||
**Parameters**
|
||||
|
||||
| Parameter | Description | Supported Type|
|
||||
| ---------- |-------------| -------- |
|
||||
| extra_args | Additional parameter for distributed pre-training| dict |
|
||||
|
||||
### get_dataloader_config
|
||||
|
||||
Obtains the configuration parameters of the data loader.
|
||||
|
||||
**Prototype**
|
||||
|
||||
```python
|
||||
def get_dataloader_config()
|
||||
```
|
@ -1,450 +0,0 @@
|
||||
# Model Pre-training
|
||||
|
||||
## Basic Concepts
|
||||
|
||||
**Pre-training** is a training policy for deep learning models, which is usually performed on a large-scale dataset. The goal of pre-training is to train the model on a related but large task so that the model learns general features and representations. However, with the rapid growth of large model parameters and the amount of training data required, the resource upper limit of a single machine can no longer meet the training requirements, so the concept of distributed training is introduced.
|
||||
|
||||
**Distributed training** means that a deep learning model task is divided into a plurality of subtasks, and training is performed in parallel on multiple computing devices. Distributed training greatly improves the training speed of large models and greatly reduces the overall model training time.
|
||||
|
||||
In this document, PreTrainer implements distributed capabilities of multiple frameworks (Megatron, DeepSpeed, and FSDP) based on Accelerate and provides common functions for pre-training process management.
|
||||
|
||||
## Environment Setup
|
||||
|
||||
```shell
|
||||
torch: 2.1.0
|
||||
transformers: 4.45.2
|
||||
accelerate: 0.28.0
|
||||
deepspeed: 0.15.2
|
||||
megatron_core: 0.4.0rc0
|
||||
```
|
||||
|
||||
### Installing the Megatron-LM Distributed Framework
|
||||
|
||||
To use the Megatron-LM distributed framework, perform the following steps:
|
||||
|
||||
1. Install Megatron. For details, see the [Megatron installation method of MindSpeed](https://gitee.com/ascend/MindSpeed#3-obtain-megatron-lm-and-specify-commit-id.)
|
||||
|
||||
```shell
|
||||
git clone https://github.com/NVIDIA/Megatron-LM.git
|
||||
cd Megatron-LM
|
||||
git checkout bcce6f54e075e3c3374ea67adefe54f3f2da2b07
|
||||
pip install --no-use-pep517 -e . # "--no-use-pep517 -e" can install all Megatron files.
|
||||
```
|
||||
|
||||
2. Install MindSpeed.
|
||||
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/MindSpeed.git
|
||||
cd MindSpeed
|
||||
git checkout origin/1.0.RC1
|
||||
pip install -r requirements.txt
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
3. Use pip to install the openmind_accelerate plugin of the Modelers community.
|
||||
|
||||
```shell
|
||||
#AArch64 platform
|
||||
pip install openmind-accelerate
|
||||
|
||||
#x86 platform
|
||||
pip install openmind-accelerate --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
4. Install Accelerate and DeepSpeed.
|
||||
|
||||
```shell
|
||||
pip install deepspeed==0.15.2
|
||||
pip install accelerate==0.28.0
|
||||
```
|
||||
|
||||
### openMind Library Environment Setup
|
||||
|
||||
```shell
|
||||
#Installation in the AArch64 environment
|
||||
pip install openmind[pt]
|
||||
|
||||
#Installation in the x86 environment
|
||||
pip install openmind[pt] --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
For details about how to install the openMind Library dependency environment, see [openMind Library Installation Guide](../install.md).
|
||||
After the installation is complete, use `pip list` to check the version dependency. If the Accelerate or Transformers version is updated during the installation, update them to the specified version.
|
||||
|
||||
## Quick Start
|
||||
|
||||
[Sample configuration files and startup scripts](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples) are provided for easy access.
|
||||
|
||||
### PreTrainer Use Procedure
|
||||
|
||||
#### Preparing Dataset
|
||||
|
||||
Prepare your own pre-training dataset, for example, [alpaca_en](https://modelers.cn/datasets/HaM/alpaca_en/tree/main) dataset.
|
||||
If you need to use the Megatron-LM distributed framework, see [Megatron Data Processing](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#data-preprocessing).
|
||||
|
||||
#### Preparing a Model
|
||||
|
||||
Prepare a model file, for example, [Llama 2](https://modelers.cn/models/AI_Connect/llama2_7b/tree/main).
|
||||
If you want to use the Megatron-LM distributed framework, you only need to prepare the **config.json** and **tokenizer** files.
|
||||
|
||||
#### Preparing Pre-training Parameters
|
||||
|
||||
The pre-training parameters can be automatically generated by loading the [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml) file. You can fine-tune the sample configuration file of the dataset in JSON format by referring to [here] (#llama2_megatron).
|
||||
|
||||
#### Startup
|
||||
|
||||
- For details about the Accelerate configuration file, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml).
|
||||
|
||||
```yaml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
debug: false
|
||||
distributed_type: MEGATRON_LM
|
||||
downcast_bf16: 'no'
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
num_machines: 1
|
||||
num_processes: 8
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: [ ]
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
|
||||
```
|
||||
|
||||
- For details about the model configuration file, see [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml).
|
||||
|
||||
<a id="llama2_megatron"></a>
|
||||
|
||||
```yaml
|
||||
num_training_steps: 1000
|
||||
micro_batch_size: µ_batch_size 4
|
||||
dp: 1
|
||||
gradient_accumulation_steps: &gradient_accumulation_steps 8
|
||||
### The value of **seq_length** must be less than or equal to the value of **max_position_embeddings** in the model weight configuration file **config.json**.
|
||||
seq_length: &seq_length 4096
|
||||
megatron_dataset_flag: False
|
||||
### data_path: Enter the path of the local fine-tuning dataset.
|
||||
data_path: &data_path '/path/to/alpaca_en/alpaca_data_en_52k.json'
|
||||
### Path for saving the fine-tuning model weight
|
||||
save_dir: './saves'
|
||||
save_interval: 10000
|
||||
eval_interval: 10000
|
||||
### openmind_model_path: Enter the path of the local model weight folder.
|
||||
openmind_model_path: '/path/to/llama2-7b-hf'
|
||||
dtype: 'bf16'
|
||||
|
||||
plugin_args:
|
||||
tp_degree: 8
|
||||
pp_degree: 1
|
||||
num_micro_batches: *gradient_accumulation_steps
|
||||
gradient_clipping: 1.0
|
||||
use_distributed_optimizer: False
|
||||
sequence_parallelism: False
|
||||
other_megatron_args:
|
||||
### tokenizer_model: path of the tokenizer.model file in the local model weight file.
|
||||
tokenizer_model: &tokenizer_model '/path/to/llama2-7b-hf/tokenizer.model'
|
||||
tokenizer_type: &tokenizer_type 'Llama2Tokenizer'
|
||||
finetune: False
|
||||
recompute_granularity: "full"
|
||||
recompute_method: "block"
|
||||
recompute_num_layers: 32
|
||||
optimizer: "adam"
|
||||
lr: 1e-5
|
||||
min_lr: 1e-6
|
||||
adam_beta2: 0.95
|
||||
add_bias_linear: False
|
||||
async_tensor_model_parallel_allreduce: False
|
||||
attention_dropout: 0.0
|
||||
attention_softmax_in_fp32: False
|
||||
bias_gelu_fusion: False
|
||||
ffn_hidden_size: 11008
|
||||
hidden_dropout: 0.0
|
||||
init_method_std: 0.01
|
||||
initial_loss_scale: 65536.0
|
||||
lr_decay_style: "cosine"
|
||||
lr_warmup_fraction: 0.01
|
||||
masked_softmax_fusion: False
|
||||
normalization: "RMSNorm"
|
||||
split: &split "100,0,0"
|
||||
swiglu: True
|
||||
untie_embeddings_and_output_weights: True
|
||||
use_flash_attn: False
|
||||
weight_decay: 0.1
|
||||
no_load_optim: True
|
||||
no_load_rng: True
|
||||
eval_iters: &eval_iters 10
|
||||
position_embedding_type: "rope"
|
||||
|
||||
dataloader_config:
|
||||
return_tensors: 'pt'
|
||||
padding: 'max_length'
|
||||
pad_to_multiple_of: *seq_length
|
||||
max_length: *seq_length
|
||||
|
||||
```
|
||||
|
||||
- For details about the pre-training program file, see [train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py). This Python script cannot be directly run. To run it, download the following repository to obtain the utils code and copy **accelerate_examples/examples/utils** to the same directory as the script.
|
||||
|
||||
```shell
|
||||
git clone https://modelers.cn/AI-Research/accelerate_examples.git
|
||||
cp -r accelerate_examples/examples/utils ./ #: Replace the destination path with the path of the train_with_megatron_json_dataset.py file.
|
||||
```
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
import openmind_accelerate
|
||||
from openmind import PreTrainingArguments, PreTrainer
|
||||
|
||||
from utils.config import get_pretrain_config_file
|
||||
from utils.accelerator import make_accelerator
|
||||
from utils.data import make_train_and_eval_dataloader
|
||||
from utils.tokenizer import get_tokenizer
|
||||
|
||||
pretrain_args = PreTrainingArguments.from_yaml(get_pretrain_config_file())
|
||||
|
||||
os.makedirs(pretrain_args.save_dir, exist_ok=True)
|
||||
|
||||
accelerator = make_accelerator(pretrain_args=pretrain_args)
|
||||
|
||||
tokenizer = get_tokenizer(tokenizer_path=pretrain_args.openmind_model_path, use_fast=False)
|
||||
transformer_dataloader_config = pretrain_args.get_dataloader_config()
|
||||
train_dataloader, eval_dataloader = make_train_and_eval_dataloader(
|
||||
dataloader_config=transformer_dataloader_config,
|
||||
micro_batch_size=pretrain_args.micro_batch_size,
|
||||
data_files=pretrain_args.data_path,
|
||||
max_length=pretrain_args.seq_length,
|
||||
tokenizer=tokenizer,
|
||||
accelerator=accelerator
|
||||
)
|
||||
|
||||
pretrainer = PreTrainer(pretrain_args=pretrain_args,
|
||||
train_dataloader=train_dataloader,
|
||||
eval_dataloader=eval_dataloader,
|
||||
)
|
||||
pretrainer.train()
|
||||
```
|
||||
|
||||
After configuring the environment configuration and preparing the configuration file, run the following command to start fine-tuning. Ensure that the training script and configuration file are in the actual local path.
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
|
||||
```
|
||||
|
||||
## Advanced Use
|
||||
|
||||
### Defining Pre-training Parameters
|
||||
|
||||
Before defining PreTrainer, you need to define a PreTrainingArguments class that contains all hyperparameters used by PreTrainer for training and evaluation. You can initialize the pre-training parameters by using the configuration file or directly transferring parameters.
|
||||
|
||||
#### Using the Configuration File
|
||||
|
||||
The pre-training parameters can be automatically generated by loading the YAML file. For more YAML examples, see [Samples Link](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples/llama2_config).
|
||||
|
||||
```python
|
||||
from openmind import PreTrainingArguments
|
||||
|
||||
# Replace the path with a local path.
|
||||
pretrain_args = PreTrainingArguments.from_yaml(
|
||||
"openmind-accelerate/examples/llama2_config/llama2-megatron.yaml"
|
||||
)
|
||||
```
|
||||
|
||||
#### Directly Passing Parameters
|
||||
|
||||
Pre-training parameters can also be instantiated through parameter pass. The initialization process of the pre-trainer for training the Megatron dataset using the Megatron model is as follows.
|
||||
|
||||
For details, see [PreTrainingArguments Description] (#pretrainingarguments Description).
|
||||
|
||||
```python
|
||||
from openmind import PreTrainingArguments
|
||||
|
||||
# Replace the path with a local path.
|
||||
pretrain_args = PreTrainingArguments(
|
||||
megatron_dataset_flag=True,
|
||||
data_path="HaM/alpaca_en",
|
||||
num_training_steps=1000,
|
||||
micro_batch_size=4,
|
||||
dp=1,
|
||||
gradient_accumulation_steps=8,
|
||||
seq_length=2048,
|
||||
)
|
||||
```
|
||||
|
||||
### Pre-training a Model Using the Megatron Framework
|
||||
|
||||
After configuring the pre-training parameters, you can start the Megatron model pre-training.
|
||||
|
||||
- For details about the configuration file for Accelerate and Megatron interconnection, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml).
|
||||
- For details about how to use the Megatron framework to train the JSON dataset, see [train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py).
|
||||
- For details about the configuration file of JSON pre-training dataset, see [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml).
|
||||
|
||||
You only need to pass the prepared `train_dataloader` (`eval_dataloader` not necessarily required) to PreTrainer. Then, you can use the custom dataloader to pre-train the model.
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
|
||||
```
|
||||
|
||||
#### (Optional) Customizing the Processing Flow of the Megatron Framework
|
||||
|
||||
##### Customizing Functions
|
||||
|
||||
When using Megatron for pre-training, you can customize any function in datasets_provider, model_provider, get_batch, and loss_function and assign the function pointer to the following attributes. For details about how to implement user-defined functions, see the official sample [pretrain_gpt.py](https://github.com/NVIDIA/Megatron-LM/blob/main/pretrain_gpt.py).
|
||||
|
||||
- `custom_megatron_datasets_provider_function`: provides the training and validation datasets of Megatron.
|
||||
- `custom_get_batch_function`: generates batch data.
|
||||
- `custom_model_provider_function`: builds models.
|
||||
- `custom_loss_function`: returns the loss function.
|
||||
|
||||
```python
|
||||
import openmind_accelerate
|
||||
from openmind import PreTrainingArguments
|
||||
from pretrain_gpt import (
|
||||
train_valid_test_datasets_provider,
|
||||
get_batch as megatron_gpt_get_batch,
|
||||
model_provider as megatron_gpt_model_provider,
|
||||
loss_func as megatron_gpt_loss_func,
|
||||
)
|
||||
|
||||
# Replace the path with a local path.
|
||||
pretrain_args = PreTrainingArguments.from_yaml(
|
||||
"openmind-accelerate/examples/llama2_config/llama2-megatron-json-dataset.yaml"
|
||||
)
|
||||
train_valid_test_datasets_provider.is_distributed = True
|
||||
pretrain_args.update_distributed_train_args(
|
||||
extra_args={
|
||||
"custom_megatron_datasets_provider_function": train_valid_test_datasets_provider,
|
||||
"custom_get_batch_function": megatron_gpt_get_batch,
|
||||
"custom_model_provider_function": megatron_gpt_model_provider,
|
||||
"custom_loss_function": megatron_gpt_loss_func,
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
##### Customizing Analytical Model Configuration File
|
||||
|
||||
You can customize the analytical function of the model configuration file based on the format configured for the Accelerate analytical model. The following is the built-in analytical function of the Llama model configuration file in PreTrainer. You can refer to the function as needed.
|
||||
|
||||
```python
|
||||
import openmind_accelerate
|
||||
from accelerate.utils import add_model_config_to_megatron_parser
|
||||
|
||||
|
||||
@add_model_config_to_megatron_parser("llama")
|
||||
def parse_llama_config(megatron_lm_plugin, model, batch_data):
|
||||
model_type_name = "gpt"
|
||||
num_layers = model.config.num_hidden_layers
|
||||
pretraining_flag = True
|
||||
hidden_size = model.config.hidden_size
|
||||
num_attention_heads = model.config.num_attention_heads
|
||||
orig_vocab_size = model.config.vocab_size
|
||||
|
||||
max_position_embeddings = getattr(model.config, "max_position_embeddings")
|
||||
seq_length = getattr(model.config, "max_sequence_length", None)
|
||||
if megatron_lm_plugin.seq_length is None:
|
||||
if seq_length is not None:
|
||||
megatron_lm_plugin.seq_length = seq_length
|
||||
elif megatron_lm_plugin.decoder_seq_length is not None:
|
||||
megatron_lm_plugin.seq_length = megatron_lm_plugin.decoder_seq_length
|
||||
elif batch_data is not None:
|
||||
megatron_lm_plugin.seq_length = batch_data["input_ids"].shape[1]
|
||||
else:
|
||||
megatron_lm_plugin.seq_length = max_position_embeddings
|
||||
|
||||
megatron_lm_plugin.megatron_lm_default_args["return_logits"] = megatron_lm_plugin.return_logits
|
||||
megatron_lm_plugin.megatron_lm_default_args["tokenizer_type"] = "Llama2Tokenizer"
|
||||
megatron_lm_plugin.megatron_lm_default_args["model_type_name"] = model_type_name
|
||||
megatron_lm_plugin.megatron_lm_default_args["num_layers"] = num_layers
|
||||
megatron_lm_plugin.megatron_lm_default_args["pretraining_flag"] = pretraining_flag
|
||||
megatron_lm_plugin.megatron_lm_default_args["hidden_size"] = hidden_size
|
||||
megatron_lm_plugin.megatron_lm_default_args["num_attention_heads"] = num_attention_heads
|
||||
megatron_lm_plugin.megatron_lm_default_args["orig_vocab_size"] = orig_vocab_size
|
||||
megatron_lm_plugin.megatron_lm_default_args["max_position_embeddings"] = max_position_embeddings
|
||||
megatron_lm_plugin.megatron_lm_default_args["seq_length"] = megatron_lm_plugin.seq_length
|
||||
megatron_lm_plugin.megatron_lm_default_args["model_return_dict"] = model.config.return_dict
|
||||
|
||||
```
|
||||
|
||||
### Using Other Frameworks to Pre-train Models
|
||||
|
||||
PreTrainer can implement a multi-framework distributed capability based on Accelerate. In addition to Megatron, PreTrainer also supports the DeepSpeed and FSDP distributed frameworks. The following uses DeepSpeed as an example.
|
||||
After configuring the JSON pre-training parameters, you can start the DeepSpeed model pre-training.
|
||||
|
||||
- For details about the configuration file for Accelerate and DeepSpeed interconnection, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_deepspeed_config.yaml).
|
||||
- For details about how to use the DeepSpeed framework to train the JSON dataset, see [train_with_deepspeed.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_deepspeed.py).
|
||||
- For details about the configuration file of JSON pre-training dataset, see [llama2_config/llama2-deepspeed.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-deepspeed.yaml).
|
||||
|
||||
```yaml
|
||||
num_training_steps: 1000
|
||||
micro_batch_size: 1
|
||||
dp: 8
|
||||
gradient_accumulation_steps: 8
|
||||
seq_length: 4096
|
||||
megatron_dataset_flag: False
|
||||
data_path: '/path/to/alpaca_en/alpaca_data_en_52k.json'
|
||||
save_dir: './saves'
|
||||
save_interval: 10000
|
||||
eval_interval: 10000
|
||||
openmind_model_path: '/path/to/llama2-7b-hf'
|
||||
dtype: 'bf16'
|
||||
|
||||
dataloader_config:
|
||||
return_tensors: 'pt'
|
||||
padding: 'max_length'
|
||||
pad_to_multiple_of: 4096
|
||||
max_length: 4096
|
||||
|
||||
### The value of **seq_length**, **max_length**, and **padding** must be less than or equal to the value of **max_position_embeddings** in the model weight configuration file **config.json**.
|
||||
```
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_deepspeed_config.yaml train_with_deepspeed.py --pretrain_config_file llama2_config/llama2-deepspeed.yaml
|
||||
```
|
||||
|
||||
## PreTrainingArguments Description
|
||||
|
||||
| **Name** | **Description** | **Type**| **Default Value**| Mandatory/Optional |
|
||||
|-----------------------------|-----------------------|--------|---------|---------|
|
||||
| num_training_steps | Total number of steps for training a model. | int | - | Mandatory |
|
||||
| micro_batch_size | Batch size of each model instance. | int | - | Mandatory |
|
||||
| dp | Data parallelism | int | - | Mandatory |
|
||||
| gradient_accumulation_steps | Number of gradient steps to be accumulated before model parameters are updated. | int | 1 | Optional |
|
||||
| seq_length | Maximum length of the sequence to be processed. | int | None | Optional |
|
||||
| megatron_dataset_flag | Whether to use a flag of the Megatron dataset. | bool | None | Optional |
|
||||
| data_path | Training dataset path. | str | None | Optional |
|
||||
| save_dir | Output directory to which the checkpoint is to be saved. | str | None | Optional |
|
||||
| save_interval | Iteration interval for saving checkpoints. | int | None | Optional |
|
||||
| eval_interval | Iteration interval for evaluation. | int | None | Optional |
|
||||
| openmind_model_path | Path of the openMind model to be trained. | str | None | Optional |
|
||||
| dtype | Dtype mode of the running model. | str | bf16 | Optional |
|
||||
| plugin_args | [Accelerate plugin parameters](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | dict | None | Optional |
|
||||
| dataloader_config | [Dataloader configuration parameters](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | dict | None | Optional |
|
||||
| report_to | Location to which Accelerate logs are reported. | str | None | Optional |
|
||||
| project_name | Project name. | str | None | Optional |
|
||||
|
||||
## PreTrainer Description
|
||||
|
||||
The PreTrainer API creates a Megatron pre-trainer or other pre-trainers based on whether Accelerate uses the Megatron-LM distributed acceleration library (specifically `ACCELERATE_USE_MEGATRON_LM=="true"`).
|
||||
|
||||
### Megatron Pre-trainer
|
||||
|
||||
| No.| Constraint Description |
|
||||
| ---- |-----------------------------------------------------------------------|
|
||||
| 1 | The Megatron dependencies need to be installed. |
|
||||
| 2 | The openmind_accelerate dependencies need to be installed. |
|
||||
| 3 | Megatron manages accumulated gradients. Therefore, the `gradient_accumulation_steps` parameter of Accelerate must be set to **1**.|
|
||||
| 4 | `train_dataloader` needs to be provided during initialization or `data_path` needs to be provided in **PreTrainingArguments**. |
|
||||
| 5 | `model` needs to be provided during initialization or `openmind_model_path` needs to be provided in **PreTrainingArguments**. |
|
||||
|
||||
### Other Pre-trainers
|
||||
|
||||
| No. | Constraint |
|
||||
| ---- |----------------------------------------------------------------|
|
||||
| 1 | `train_dataloader` needs to be provided during initialization. |
|
||||
| 2 | `optimizer` needs to be provided during initialization. |
|
||||
| 3 | `lr_scheduler` needs to be provided during initialization. |
|
||||
| 4 | `model` needs to be provided during initialization or `openmind_model_path` needs to be provided in **PreTrainingArguments**.|
|
||||
|
||||
*Thank community contributors for contributing the llama 2 model and alpaca_en dataset.*
|
@ -4,8 +4,6 @@ openMind Library is an open-source deep learning development kit. It supports mo
|
||||
|
||||
## openMind Library Features
|
||||
|
||||
+ To cope with the challenges of distributed training of foundation models, openMind Library provides pre-training APIs and acceleration libraries such as MindSpeed and Accelerate to help you quickly and smoothly train foundation models. For details, see [model pre-training](basic_tutorial/pretrainer.md).
|
||||
|
||||
+ openMind Library encapsulates APIs such as Transformers, MindFormers AutoClass, Pipeline, and Trainer, enhances functions, and provides the capability of automatic download and load of models from the Modelers community. In addition, the Ascend NPU affinity feature is added, effectively improves the performance of model training and inference on Ascend NPUs. For details, see [Model Fine-Tuning](basic_tutorial/finetune/overview.md) and [Model Inference](basic_tutorial/pipeline.md).
|
||||
|
||||
+ openMind Library provides simple and easy-to-use command-line interfaces (CLIs) for quickly uploading, downloading, inferring, dialog, and deploying models with low code. For details, see the [command line interface](basic_tutorial/cli.md).
|
||||
|
@ -50,13 +50,6 @@
|
||||
"en": "Data Load"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "pretrainer",
|
||||
"label": {
|
||||
"zh": "模型预训练",
|
||||
"en": "Model Pre-training"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "train",
|
||||
"label": {
|
||||
@ -343,13 +336,6 @@
|
||||
"en": "Pipelines"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "pretrainer_api",
|
||||
"label": {
|
||||
"zh": "PreTrainer",
|
||||
"en": "PreTrainer"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "trainer_api",
|
||||
"label": {
|
||||
|
@ -1,124 +0,0 @@
|
||||
# PreTrainer 模块接口
|
||||
|
||||
## openmind.PreTrainer类
|
||||
|
||||
`PreTrainer`类提供了通用的预训练流程管理功能。
|
||||
|
||||
**参数列表**
|
||||
|
||||
| 参数名 | 类型 | 描述 | 默认值 |
|
||||
| ---------------- | ------------------------------------------- |---------------|------|
|
||||
| pretrain_args | PreTrainingArguments | 预训练参数。 | - |
|
||||
| accelerator | Accelerator | accelerate实例。 | None |
|
||||
| model | torch.nn.Module | torch模型。 | None |
|
||||
| optimizer | accelerate.utils.MegatronLMOptimizerWrapper | 优化器。 | None |
|
||||
| lr_scheduler | accelerate.utils.MegatronLMSchedulerWrapper | 调度器。 | None |
|
||||
| train_dataloader | torch.utils.data.DataLoader | 训练数据加载器。 | None |
|
||||
| eval_dataloader | torch.utils.data.DataLoader | 评估数据加载器。 | None |
|
||||
|
||||
### train
|
||||
|
||||
预训练启动。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def train()
|
||||
```
|
||||
|
||||
## openmind.PreTrainingArguments类
|
||||
|
||||
`PreTrainingArguments`类用于配置训练任务的参数,包括训练过程中所需的超参数、模型保存路径和学习率等。
|
||||
|
||||
**参数列表**
|
||||
|
||||
| 参数名 | 类型 | 描述 | PyTorch默认值 |
|
||||
| --------------------------- | ---- |-------------------|-----------------------|
|
||||
| num_training_steps | int | 训练步数。 | - |
|
||||
| micro_batch_size | int | 微批大小。 | - |
|
||||
| dp | int | 并行度。 | - |
|
||||
| gradient_accumulation_steps | int | 梯度累计步数。 | 1 |
|
||||
| seq_length | int | 最大处理序列长度。 | None |
|
||||
| megatron_dataset_flag | bool | 是否未megatron格式数据集。 | None |
|
||||
| data_path | str | 数据集路径。 | None |
|
||||
| save_dir | str | 模型保存路径。 | None |
|
||||
| save_interval | int | 模型保存间隔。 | None |
|
||||
| eval_interval | int | 模型评估间隔。 | None |
|
||||
| openmind_model_path | str | 模型路径。 | None |
|
||||
| dtype | str | 运行时数据类型。 | bf16 |
|
||||
| plugin_args | dict | [Accelerate插件参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | None |
|
||||
| dataloader_config | dict | [加载器配置参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | None |
|
||||
| report_to | str | accelerate日志输出对象。 | None |
|
||||
| project_name | str | 项目名称。 | "accelerate-megatron" |
|
||||
|
||||
### from_yaml
|
||||
|
||||
从yaml配置文件加载配置。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def from_yaml(config_path: str)
|
||||
```
|
||||
|
||||
**参数列表**
|
||||
|
||||
| 参数名 | 描述 | 支持类型 |
|
||||
| ----------- |-------------| -------- |
|
||||
| config_path | yaml配置文件路径。 | str |
|
||||
|
||||
### get_mixed_precision
|
||||
|
||||
获取混合精度类型。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def get_mixed_precision()
|
||||
```
|
||||
|
||||
### get_torch_dtype
|
||||
|
||||
获取运行时数据类型。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def get_torch_dtype()
|
||||
```
|
||||
|
||||
### get_distributed_train_args
|
||||
|
||||
获取分布式预训练参数。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def get_distributed_train_args()
|
||||
```
|
||||
|
||||
### update_distributed_train_args
|
||||
|
||||
更新分布式预训练参数。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def update_distributed_train_args(extra_args: dict)
|
||||
```
|
||||
|
||||
**参数列表**
|
||||
|
||||
| 参数名 | 描述 | 支持类型 |
|
||||
| ---------- |-------------| -------- |
|
||||
| extra_args | 分布式预训练额外参数。 | dict |
|
||||
|
||||
### get_dataloader_config
|
||||
|
||||
获取数据加载器配置参数。
|
||||
|
||||
**接口原型**
|
||||
|
||||
```python
|
||||
def get_dataloader_config()
|
||||
```
|
@ -1,450 +0,0 @@
|
||||
# 模型预训练
|
||||
|
||||
## 基础概念
|
||||
|
||||
**预训练**是一种深度学习模型训练的策略,通常在大规模的数据集上进行。预训练的目标是通过在一个相关但较大的任务上训练模型,使得模型学习到通用的特征表示。但是随着大模型参数和所需训练数据量的急剧增长,单个机器的资源上限已无法满足训练要求,于是就引出了分布式训练的概念。
|
||||
|
||||
**分布式训练**指的是将深度学习模型任务分解为多个子任务,并在多个计算设备上并行的进行训练。分布式训练极大地提升了大模型的训练速度,可以大幅降低模型训练的总体时间。
|
||||
|
||||
本文档中的PreTrainer是基于Accelerate实现了多框架(Megatron、DeepSpeed以及FSDP)的分布式能力,并提供了通用的预训练流程管理功能。
|
||||
|
||||
## 环境准备
|
||||
|
||||
```shell
|
||||
torch: 2.1.0
|
||||
transformers: 4.45.2
|
||||
accelerate: 0.28.0
|
||||
deepspeed: 0.15.2
|
||||
megatron_core: 0.4.0rc0
|
||||
```
|
||||
|
||||
### 安装Megatron-LM分布式框架
|
||||
|
||||
若用户需要使用Megatron-LM分布式框架,则还需执行以下步骤。
|
||||
|
||||
1. 安装Megatron([参考MindSpeed的Megatron安装方式](https://gitee.com/ascend/MindSpeed#3-获取-megatron-lm-并指定-commit-id))
|
||||
|
||||
```shell
|
||||
git clone https://github.com/NVIDIA/Megatron-LM.git
|
||||
cd Megatron-LM
|
||||
git checkout bcce6f54e075e3c3374ea67adefe54f3f2da2b07
|
||||
pip install --no-use-pep517 -e . # 使用"--no-use-pep517 -e"安装megatron全部文件
|
||||
```
|
||||
|
||||
2. 安装MindSpeed
|
||||
|
||||
```shell
|
||||
git clone https://gitee.com/ascend/MindSpeed.git
|
||||
cd MindSpeed
|
||||
git checkout origin/1.0.RC1
|
||||
pip install -r requirements.txt
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
3. 使用pip安装魔乐社区openmind_accelerate插件
|
||||
|
||||
```shell
|
||||
#aarch64平台
|
||||
pip install openmind-accelerate
|
||||
|
||||
#x86平台
|
||||
pip install openmind-accelerate --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
4. 安装accelerate与deepspeed
|
||||
|
||||
```shell
|
||||
pip install deepspeed==0.15.2
|
||||
pip install accelerate==0.28.0
|
||||
```
|
||||
|
||||
### openMind Library环境准备
|
||||
|
||||
```shell
|
||||
#aarch64环境下安装
|
||||
pip install openmind[pt]
|
||||
|
||||
#x86环境下安装
|
||||
pip install openmind[pt] --extra-index-url https://download.pytorch.org/whl/cpu
|
||||
```
|
||||
|
||||
openMind Library依赖环境安装请参考[openMind Library安装指南](../install.md)。
|
||||
安装完成后请使用`pip list`检查版本依赖,如果在安装上述依赖的时候,accelerate或transformers版本被刷新,请重新刷回指定版本。
|
||||
|
||||
## 快速使用
|
||||
|
||||
我们提供了[样例配置文件和启动脚本](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples),方便用户一键使用。
|
||||
|
||||
### PreTrainer的使用步骤如下所示
|
||||
|
||||
#### 准备数据
|
||||
|
||||
用户需要准备好自己的预训练数据,例如[alpaca_en](https://modelers.cn/datasets/HaM/alpaca_en/tree/main)数据。
|
||||
如果用户需要使用Megatron-LM分布式框架,可参考[Megatron的数据处理方法](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#data-preprocessing) 进行处理。
|
||||
|
||||
#### 准备模型
|
||||
|
||||
用户需要准备好模型文件,例如[llama2模型](https://modelers.cn/models/AI_Connect/llama2_7b/tree/main)。
|
||||
如果用户需要使用Megatron-LM分布式框架,则只需要准备config.json和tokenizer相关文件即可。
|
||||
|
||||
#### 准备预训练参数
|
||||
|
||||
预训练参数可以通过加载 [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml) 文件自动生成,用户可参考[此处](#llama2_megatron)基于json格式微调数据集的样例配置文件:
|
||||
|
||||
#### 启动
|
||||
|
||||
- Accelerate配置文件可参考:[accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml)
|
||||
|
||||
```yaml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
debug: false
|
||||
distributed_type: MEGATRON_LM
|
||||
downcast_bf16: 'no'
|
||||
machine_rank: 0
|
||||
main_training_function: main
|
||||
num_machines: 1
|
||||
num_processes: 8
|
||||
rdzv_backend: static
|
||||
same_network: true
|
||||
tpu_env: [ ]
|
||||
tpu_use_cluster: false
|
||||
tpu_use_sudo: false
|
||||
use_cpu: false
|
||||
|
||||
```
|
||||
|
||||
- 模型配置文件可参考:[llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml)
|
||||
|
||||
<a id="llama2_megatron"></a>
|
||||
|
||||
```yaml
|
||||
num_training_steps: 1000
|
||||
micro_batch_size: µ_batch_size 4
|
||||
dp: 1
|
||||
gradient_accumulation_steps: &gradient_accumulation_steps 8
|
||||
### seq_length需要小于或等于模型权重配置文件config.json中,"max_position_embeddings"字段的值
|
||||
seq_length: &seq_length 4096
|
||||
megatron_dataset_flag: False
|
||||
### data_path请传入本地微调数据集所在路径
|
||||
data_path: &data_path '/path/to/alpaca_en/alpaca_data_en_52k.json'
|
||||
### 微调模型权重保存路径
|
||||
save_dir: './saves'
|
||||
save_interval: 10000
|
||||
eval_interval: 10000
|
||||
### openmind_model_path请传入本地模型权重文件夹所在路径
|
||||
openmind_model_path: '/path/to/llama2-7b-hf'
|
||||
dtype: 'bf16'
|
||||
|
||||
plugin_args:
|
||||
tp_degree: 8
|
||||
pp_degree: 1
|
||||
num_micro_batches: *gradient_accumulation_steps
|
||||
gradient_clipping: 1.0
|
||||
use_distributed_optimizer: False
|
||||
sequence_parallelism: False
|
||||
other_megatron_args:
|
||||
### tokenizer_model请传入本地模型权重文件中,tokenizer.model文件所在路径
|
||||
tokenizer_model: &tokenizer_model '/path/to/llama2-7b-hf/tokenizer.model'
|
||||
tokenizer_type: &tokenizer_type 'Llama2Tokenizer'
|
||||
finetune: False
|
||||
recompute_granularity: "full"
|
||||
recompute_method: "block"
|
||||
recompute_num_layers: 32
|
||||
optimizer: "adam"
|
||||
lr: 1e-5
|
||||
min_lr: 1e-6
|
||||
adam_beta2: 0.95
|
||||
add_bias_linear: False
|
||||
async_tensor_model_parallel_allreduce: False
|
||||
attention_dropout: 0.0
|
||||
attention_softmax_in_fp32: False
|
||||
bias_gelu_fusion: False
|
||||
ffn_hidden_size: 11008
|
||||
hidden_dropout: 0.0
|
||||
init_method_std: 0.01
|
||||
initial_loss_scale: 65536.0
|
||||
lr_decay_style: "cosine"
|
||||
lr_warmup_fraction: 0.01
|
||||
masked_softmax_fusion: False
|
||||
normalization: "RMSNorm"
|
||||
split: &split "100,0,0"
|
||||
swiglu: True
|
||||
untie_embeddings_and_output_weights: True
|
||||
use_flash_attn: False
|
||||
weight_decay: 0.1
|
||||
no_load_optim: True
|
||||
no_load_rng: True
|
||||
eval_iters: &eval_iters 10
|
||||
position_embedding_type: "rope"
|
||||
|
||||
dataloader_config:
|
||||
return_tensors: 'pt'
|
||||
padding: 'max_length'
|
||||
pad_to_multiple_of: *seq_length
|
||||
max_length: *seq_length
|
||||
|
||||
```
|
||||
|
||||
- 预训练程序文件可参考[train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py),此python脚本不能直接运行,如需运行,请自行下载如下仓库获取utils相关代码,然后将accelerate_examples/examples/utils复制到此脚本同目录下。
|
||||
|
||||
```shell
|
||||
git clone https://modelers.cn/AI-Research/accelerate_examples.git
|
||||
cp -r accelerate_examples/examples/utils ./ # 自行替换目的路径为train_with_megatron_json_dataset.py所在路径
|
||||
```
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
import openmind_accelerate
|
||||
from openmind import PreTrainingArguments, PreTrainer
|
||||
|
||||
from utils.config import get_pretrain_config_file
|
||||
from utils.accelerator import make_accelerator
|
||||
from utils.data import make_train_and_eval_dataloader
|
||||
from utils.tokenizer import get_tokenizer
|
||||
|
||||
pretrain_args = PreTrainingArguments.from_yaml(get_pretrain_config_file())
|
||||
|
||||
os.makedirs(pretrain_args.save_dir, exist_ok=True)
|
||||
|
||||
accelerator = make_accelerator(pretrain_args=pretrain_args)
|
||||
|
||||
tokenizer = get_tokenizer(tokenizer_path=pretrain_args.openmind_model_path, use_fast=False)
|
||||
transformer_dataloader_config = pretrain_args.get_dataloader_config()
|
||||
train_dataloader, eval_dataloader = make_train_and_eval_dataloader(
|
||||
dataloader_config=transformer_dataloader_config,
|
||||
micro_batch_size=pretrain_args.micro_batch_size,
|
||||
data_files=pretrain_args.data_path,
|
||||
max_length=pretrain_args.seq_length,
|
||||
tokenizer=tokenizer,
|
||||
accelerator=accelerator
|
||||
)
|
||||
|
||||
pretrainer = PreTrainer(pretrain_args=pretrain_args,
|
||||
train_dataloader=train_dataloader,
|
||||
eval_dataloader=eval_dataloader,
|
||||
)
|
||||
pretrainer.train()
|
||||
```
|
||||
|
||||
在完成上述环境配置以及配置文件准备后,即可通过如下命令启动微调,请确保其中的训练脚本和配置文件为本地实际路径。
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
|
||||
```
|
||||
|
||||
## 进阶使用
|
||||
|
||||
### 定义预训练参数
|
||||
|
||||
在我们定义PreTrainer之前首先需要定义一个PreTrainingArguments类,它将包含PreTrainer用于训练和评估的所有超参数。用户可以通过配置文件或者直接传参初始化预训练参数。
|
||||
|
||||
#### 使用配置文件
|
||||
|
||||
预训练参数可以通过加载yaml文件自动生成,更多yaml样例可参考:[样例链接](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples/llama2_config)。
|
||||
|
||||
```python
|
||||
from openmind import PreTrainingArguments
|
||||
|
||||
# 路径需要替换为本地路径
|
||||
pretrain_args = PreTrainingArguments.from_yaml(
|
||||
"openmind-accelerate/examples/llama2_config/llama2-megatron.yaml"
|
||||
)
|
||||
```
|
||||
|
||||
#### 直接传参
|
||||
|
||||
预训练参数也可以通过传参的方式实例化。使用Megatron模型训练Megatron数据集的预训练器初始化流程如下。
|
||||
|
||||
参数链接请点击:[PreTrainingArguments说明](#pretrainingarguments说明)。
|
||||
|
||||
```python
|
||||
from openmind import PreTrainingArguments
|
||||
|
||||
# 路径需要替换为本地路径
|
||||
pretrain_args = PreTrainingArguments(
|
||||
megatron_dataset_flag=True,
|
||||
data_path="HaM/alpaca_en",
|
||||
num_training_steps=1000,
|
||||
micro_batch_size=4,
|
||||
dp=1,
|
||||
gradient_accumulation_steps=8,
|
||||
seq_length=2048,
|
||||
)
|
||||
```
|
||||
|
||||
### 使用Megatron框架预训练模型
|
||||
|
||||
用户完成预训练参数配置后即可启动Megatron模型预训练。
|
||||
|
||||
- Accelerate对接Megatron的配置文件可参考:[accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml)
|
||||
- 使用Megatron框架训练Json数据运行示例可参考:[train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py)。
|
||||
- Json格式数据预训练配置文件示例可参考:[llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml)。
|
||||
|
||||
用户只需要将准备好的`train_dataloader`(`eval_dataloader`非必选),传给PreTrainer,即可使用用户自定义的dataloader预训练模型。
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
|
||||
```
|
||||
|
||||
#### 自定义Megatron框架处理流程(可选)
|
||||
|
||||
##### 自定义处理函数
|
||||
|
||||
如下代码所示,PreTrainer接口在使用Megatron预训练时,支持用户根据实际场景按需自定义`datasets_provider`、`model_provider`、`get_batch`和`loss_function`中的任意函数,并将函数指针赋值到如下属性中。自定义函数的实现可参考官方样例[pretrain_gpt.py](https://github.com/NVIDIA/Megatron-LM/blob/main/pretrain_gpt.py)。
|
||||
|
||||
- `custom_megatron_datasets_provider_function`:用于提供Megatron的训练和验证数据集。
|
||||
- `custom_get_batch_function`:用于生成批次数据。
|
||||
- `custom_model_provider_function`:用于构建模型。
|
||||
- `custom_loss_function`:返回损失函数。
|
||||
|
||||
```python
|
||||
import openmind_accelerate
|
||||
from openmind import PreTrainingArguments
|
||||
from pretrain_gpt import (
|
||||
train_valid_test_datasets_provider,
|
||||
get_batch as megatron_gpt_get_batch,
|
||||
model_provider as megatron_gpt_model_provider,
|
||||
loss_func as megatron_gpt_loss_func,
|
||||
)
|
||||
|
||||
# 路径需要替换为本地路径
|
||||
pretrain_args = PreTrainingArguments.from_yaml(
|
||||
"openmind-accelerate/examples/llama2_config/llama2-megatron-json-dataset.yaml"
|
||||
)
|
||||
train_valid_test_datasets_provider.is_distributed = True
|
||||
pretrain_args.update_distributed_train_args(
|
||||
extra_args={
|
||||
"custom_megatron_datasets_provider_function": train_valid_test_datasets_provider,
|
||||
"custom_get_batch_function": megatron_gpt_get_batch,
|
||||
"custom_model_provider_function": megatron_gpt_model_provider,
|
||||
"custom_loss_function": megatron_gpt_loss_func,
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
##### 自定义解析模型配置文件
|
||||
|
||||
用户可依据Accelerate解析模型配置的格式,自定义模型配置文件解析函数。以下为PreTrainer内置的llama模型配置文件解析函数,用户可以根据实际情况参考。
|
||||
|
||||
```python
|
||||
import openmind_accelerate
|
||||
from accelerate.utils import add_model_config_to_megatron_parser
|
||||
|
||||
|
||||
@add_model_config_to_megatron_parser("llama")
|
||||
def parse_llama_config(megatron_lm_plugin, model, batch_data):
|
||||
model_type_name = "gpt"
|
||||
num_layers = model.config.num_hidden_layers
|
||||
pretraining_flag = True
|
||||
hidden_size = model.config.hidden_size
|
||||
num_attention_heads = model.config.num_attention_heads
|
||||
orig_vocab_size = model.config.vocab_size
|
||||
|
||||
max_position_embeddings = getattr(model.config, "max_position_embeddings")
|
||||
seq_length = getattr(model.config, "max_sequence_length", None)
|
||||
if megatron_lm_plugin.seq_length is None:
|
||||
if seq_length is not None:
|
||||
megatron_lm_plugin.seq_length = seq_length
|
||||
elif megatron_lm_plugin.decoder_seq_length is not None:
|
||||
megatron_lm_plugin.seq_length = megatron_lm_plugin.decoder_seq_length
|
||||
elif batch_data is not None:
|
||||
megatron_lm_plugin.seq_length = batch_data["input_ids"].shape[1]
|
||||
else:
|
||||
megatron_lm_plugin.seq_length = max_position_embeddings
|
||||
|
||||
megatron_lm_plugin.megatron_lm_default_args["return_logits"] = megatron_lm_plugin.return_logits
|
||||
megatron_lm_plugin.megatron_lm_default_args["tokenizer_type"] = "Llama2Tokenizer"
|
||||
megatron_lm_plugin.megatron_lm_default_args["model_type_name"] = model_type_name
|
||||
megatron_lm_plugin.megatron_lm_default_args["num_layers"] = num_layers
|
||||
megatron_lm_plugin.megatron_lm_default_args["pretraining_flag"] = pretraining_flag
|
||||
megatron_lm_plugin.megatron_lm_default_args["hidden_size"] = hidden_size
|
||||
megatron_lm_plugin.megatron_lm_default_args["num_attention_heads"] = num_attention_heads
|
||||
megatron_lm_plugin.megatron_lm_default_args["orig_vocab_size"] = orig_vocab_size
|
||||
megatron_lm_plugin.megatron_lm_default_args["max_position_embeddings"] = max_position_embeddings
|
||||
megatron_lm_plugin.megatron_lm_default_args["seq_length"] = megatron_lm_plugin.seq_length
|
||||
megatron_lm_plugin.megatron_lm_default_args["model_return_dict"] = model.config.return_dict
|
||||
|
||||
```
|
||||
|
||||
### 使用其他框架预训练模型
|
||||
|
||||
PreTrainer是基于Accelerate实现的多框架分布式能力,所以PreTrainer除了支持Megatron框架,还支持DeepSpeed和FSDP分布式框架。如下以DeepSpeed分布式框架为例:
|
||||
用户完成Json格式预训练参数配置后即可启动DeepSpeed模型预训练。
|
||||
|
||||
- Accelerate对接DeepSpeed的配置文件示例可参考:[accelerate_config/accelerate_deepspeed_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_deepspeed_config.yaml)。
|
||||
- 使用DeepSpeed框架训练Json数据运行示例可参考:[train_with_deepspeed.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_deepspeed.py)。
|
||||
- Json格式数据预训练配置文件示例可参考:[llama2_config/llama2-deepspeed.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-deepspeed.yaml)。
|
||||
|
||||
```yaml
|
||||
num_training_steps: 1000
|
||||
micro_batch_size: 1
|
||||
dp: 8
|
||||
gradient_accumulation_steps: 8
|
||||
seq_length: 4096
|
||||
megatron_dataset_flag: False
|
||||
data_path: '/path/to/alpaca_en/alpaca_data_en_52k.json'
|
||||
save_dir: './saves'
|
||||
save_interval: 10000
|
||||
eval_interval: 10000
|
||||
openmind_model_path: '/path/to/llama2-7b-hf'
|
||||
dtype: 'bf16'
|
||||
|
||||
dataloader_config:
|
||||
return_tensors: 'pt'
|
||||
padding: 'max_length'
|
||||
pad_to_multiple_of: 4096
|
||||
max_length: 4096
|
||||
|
||||
### seq_length、max_length以及padding的值均需要小于或等于模型权重配置文件config.json中,"max_position_embeddings"字段的值
|
||||
```
|
||||
|
||||
```shell
|
||||
accelerate launch --config_file accelerate_config/accelerate_deepspeed_config.yaml train_with_deepspeed.py --pretrain_config_file llama2_config/llama2-deepspeed.yaml
|
||||
```
|
||||
|
||||
## PreTrainingArguments说明
|
||||
|
||||
| **参数名** | **描述** | **类型** | **默认值** | 是否可选 |
|
||||
|-----------------------------|-----------------------|--------|---------|---------|
|
||||
| num_training_steps | 训练模型的总步数。 | int | - | 必选 |
|
||||
| micro_batch_size | 每个模型实例的批处理大小。 | int | - | 必选 |
|
||||
| dp | 数据并行度。 | int | - | 必选 |
|
||||
| gradient_accumulation_steps | 在更新模型参数之前要累积的梯度步数。 | int | 1 | 可选 |
|
||||
| seq_length | 要处理的最大序列长度。 | int | None | 可选 |
|
||||
| megatron_dataset_flag | 是否使用Megatron类型数据集的标志。 | bool | None | 可选 |
|
||||
| data_path | 训练数据集的路径。 | str | None | 可选 |
|
||||
| save_dir | 要将检查点保存到的输出目录。 | str | None | 可选 |
|
||||
| save_interval | 检查点保存的迭代间隔。 | int | None | 可选 |
|
||||
| eval_interval | 验证集评估的迭代间隔。 | int | None | 可选 |
|
||||
| openmind_model_path | 待训练的openMind模型的路径。 | str | None | 可选 |
|
||||
| dtype | 运行模型的dtype模式。 | str | bf16 | 可选 |
|
||||
| plugin_args | [Accelerate插件参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | dict | None | 可选 |
|
||||
| dataloader_config | [加载器配置参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | dict | None | 可选 |
|
||||
| report_to | Accelerate日志上报到何处。 | str | None | 可选 |
|
||||
| project_name | 项目的名称。 | str | None | 可选 |
|
||||
|
||||
## PreTrainer说明
|
||||
|
||||
PreTrainer接口会根据Accelerate是否使用Megatron-LM分布式加速库(以环境变量`ACCELERATE_USE_MEGATRON_LM=="true"`为依据),来选择创建Megatron预训练器或其他预训练器。
|
||||
|
||||
### Megatron预训练器
|
||||
|
||||
| 序号 | 约束描述 |
|
||||
| ---- |-----------------------------------------------------------------------|
|
||||
| 1 | 需要预先安装Megatron依赖。 |
|
||||
| 2 | 需要预先安装openmind_accelerate插件依赖。 |
|
||||
| 3 | Megatron会自管理累积梯度,所以Accelerate的`gradient_accumulation_steps`参数需要指定为 1。 |
|
||||
| 4 | 初始化时需要提供`train_dataloader`或在PreTrainingArguments里提供`data_path`。 |
|
||||
| 5 | 初始化时需要提供`model`或在PreTrainingArguments里提供`openmind_model_path`。 |
|
||||
|
||||
### 其他预训练器
|
||||
|
||||
| 序号 | 约束描述 |
|
||||
| ---- |----------------------------------------------------------------|
|
||||
| 1 | 初始化时需要提供`train_dataloader`。 |
|
||||
| 2 | 初始化时需要提供`optimizer`。 |
|
||||
| 3 | 初始化时需要提供`lr_scheduler`。 |
|
||||
| 4 | 初始化时需要提供`model`或在PreTrainingArguments里提供`openmind_model_path`。 |
|
||||
|
||||
*感谢社区贡献的 llama2 模型以及 alpaca_en 数据集*
|
@ -79,6 +79,75 @@ You are a helpful assistant.<|im_end|>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<!-- Qwen3 -->
|
||||
<tr>
|
||||
<td rowspan="11">Qwen3</td>
|
||||
<td>Qwen3-32B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-32B</td>
|
||||
<td>Qwen/Qwen3-32B</td>
|
||||
<td rowspan="11">qwen</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-14B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-14B</td>
|
||||
<td>Qwen/Qwen3-14B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-14B</td>
|
||||
<td>Models_Ecosystem/Qwen3-14B-Base</td>
|
||||
<td>Qwen/Qwen3-14B-Base</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-8B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-8B</td>
|
||||
<td>Qwen/Qwen3-8B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-8B</td>
|
||||
<td>Models_Ecosystem/Qwen3-8B-Base</td>
|
||||
<td>Qwen/Qwen3-8B-Base</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-4B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-4B</td>
|
||||
<td>Qwen/Qwen3-4B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-4B</td>
|
||||
<td>Models_Ecosystem/Qwen3-4B-Base</td>
|
||||
<td>Qwen/Qwen3-4B-Base</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-1.7B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-1.7B</td>
|
||||
<td>Qwen/Qwen3-1.7B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-1.7B</td>
|
||||
<td>Models_Ecosystem/Qwen3-1.7B-Base</td>
|
||||
<td>Qwen/Qwen3-1.7B-Base</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-0.6B-Chat</td>
|
||||
<td>Models_Ecosystem/Qwen3-0.6B</td>
|
||||
<td>Qwen/Qwen3-0.6B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Qwen3-0.6B</td>
|
||||
<td>Models_Ecosystem/Qwen3-0.6B-Base</td>
|
||||
<td>Qwen/Qwen3-0.6B-Base</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<!-- Qwen2.5 -->
|
||||
<tr>
|
||||
<td rowspan="3">Qwen2.5</td>
|
||||
@ -100,6 +169,15 @@ You are a helpful assistant.<|im_end|>
|
||||
<td>Qwen/Qwen2.5-32B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<!-- Qwen2.5-VL -->
|
||||
<tr>
|
||||
<td>Qwen2.5-VL</td>
|
||||
<td>Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>PyTorch-NPU/Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>Qwen/Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>qwen2_vl</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<!-- Qwen2 -->
|
||||
<tr>
|
||||
<td rowspan="3">Qwen2</td>
|
||||
@ -256,15 +334,6 @@ You are a helpful assistant.<|im_end|>
|
||||
<td>deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<!-- Qwen2.5-VL -->
|
||||
<tr>
|
||||
<td>Qwen2.5-VL</td>
|
||||
<td>Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>PyTorch-NPU/Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>Qwen/Qwen2.5-VL-7B-Instruct</td>
|
||||
<td>qwen2_vl</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
</tbody>
|
||||
</table>
|
||||
|
@ -4,8 +4,6 @@ openMind Library是一个深度学习开发套件,通过简单易用的API支
|
||||
|
||||
## openMind Library特性
|
||||
|
||||
+ 为了应对大模型分布式训练的挑战,openMind Library提供了预训练接口,支持MindSpeed、Accelerate等加速库,帮助开发者顺畅快速地训练大模型,具体可参考[模型预训练](basic_tutorial/pretrainer.md)章节。
|
||||
|
||||
+ openMind Library基于[transformers库](https://github.com/huggingface/transformers),集成了PyTorch框架下主流第三方工具的功能,提供了一键式的封装的微调命令行接口解决方案,涵盖了从数据处理、权重加载,到低参数训练、量化适配,训练和跟踪的全流程功能,更多细节可查看[模型训练](basic_tutorial/train/overview.md)。
|
||||
|
||||
+ openMind Library对Transformers和MindFormers的AutoClass、Pipeline、Trainer等接口进行封装,并增强了其功能,提供了对应的SDK。还提供了从魔乐社区自动下载和加载模型的能力,同时扩展新增了昇腾NPU亲和的特性,有效提升在昇腾NPU上进行模型训练推理的性能,具体可参考[模型训练](basic_tutorial/train/overview.md)和[模型推理](basic_tutorial/pipeline.md)章节。
|
||||
|
@ -51,8 +51,6 @@ if TYPE_CHECKING:
|
||||
from .archived.trainers import (
|
||||
Trainer,
|
||||
TrainingArguments,
|
||||
PreTrainer,
|
||||
PreTrainingArguments,
|
||||
)
|
||||
from .archived.pipelines import pipeline
|
||||
from .omdatasets import OmDataset
|
||||
|
@ -18,16 +18,12 @@ from openmind.utils import _LazyModule
|
||||
if TYPE_CHECKING:
|
||||
from .trainer import Trainer
|
||||
from .training_args import TrainingArguments
|
||||
from .pretrainer import PreTrainer
|
||||
from .pretraining_args import PreTrainingArguments
|
||||
else:
|
||||
import sys
|
||||
|
||||
_import_structure = {
|
||||
"trainer": ["Trainer"],
|
||||
"training_args": ["TrainingArguments"],
|
||||
"pretrainer": ["PreTrainer"],
|
||||
"pretraining_args": ["PreTrainingArguments"],
|
||||
}
|
||||
|
||||
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
|
||||
|
@ -1,452 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
|
||||
import dataclasses
|
||||
import importlib
|
||||
import importlib.util
|
||||
import os
|
||||
import time
|
||||
import warnings
|
||||
|
||||
from accelerate import Accelerator, init_empty_weights
|
||||
|
||||
|
||||
try:
|
||||
import torch
|
||||
except ImportError as e:
|
||||
raise ImportError("Please install torch package before using this PreTrainer.") from e
|
||||
import torch.utils.data
|
||||
from transformers import AutoConfig, AutoModelForCausalLM
|
||||
|
||||
from .pretrainer_utils import print_in_last_rank, print_in_main_process
|
||||
from .pretraining_args import PreTrainingArguments
|
||||
|
||||
|
||||
warnings.warn(
|
||||
"The class 'PreTrainer' is deprecated and will be removed in version 1.1.0. ",
|
||||
FutureWarning,
|
||||
)
|
||||
|
||||
|
||||
class _PreTrainerCommon:
|
||||
def __init__(
|
||||
self,
|
||||
pretrain_args: PreTrainingArguments,
|
||||
accelerator: Accelerator = None,
|
||||
model: torch.nn.Module = None,
|
||||
optimizer=None,
|
||||
lr_scheduler=None,
|
||||
train_dataloader: torch.utils.data.DataLoader = None,
|
||||
eval_dataloader: torch.utils.data.DataLoader = None,
|
||||
*args,
|
||||
**kwargs,
|
||||
):
|
||||
self.model = model
|
||||
self.pretrain_args = pretrain_args
|
||||
self.train_dataloader = train_dataloader
|
||||
self.optimizer = optimizer
|
||||
self.lr_scheduler = lr_scheduler
|
||||
self.accelerator = accelerator
|
||||
self.eval_dataloader = eval_dataloader
|
||||
self.completed_steps = 0
|
||||
|
||||
self._post_init()
|
||||
|
||||
def train(self):
|
||||
self._pre_training()
|
||||
|
||||
batch_loss_sum = 0
|
||||
start_time = time.time()
|
||||
|
||||
while self.completed_steps < self.pretrain_args.num_training_steps:
|
||||
for batch in self.train_dataloader:
|
||||
outputs = self._train_step(batch)
|
||||
loss_ = outputs.loss.detach().float()
|
||||
batch_loss_sum += loss_.item()
|
||||
if self.accelerator.sync_gradients:
|
||||
self.completed_steps += 1
|
||||
else:
|
||||
continue # for accelerator's gradient_accumulation
|
||||
|
||||
lr = self._get_lr()
|
||||
batch_loss_avg = self._get_batch_loss_avg(batch_loss_sum=batch_loss_sum)
|
||||
elapsed_time = (time.time() - start_time) * 1000 # ms
|
||||
self._train_step_log(step=self.completed_steps, loss=batch_loss_avg, lr=lr, elapsed_time=elapsed_time)
|
||||
batch_loss_sum = 0
|
||||
|
||||
if (
|
||||
self.pretrain_args.save_interval
|
||||
and self.completed_steps % self.pretrain_args.save_interval == 0
|
||||
and self.pretrain_args.save_dir
|
||||
):
|
||||
self._save_state(save_dir=self.pretrain_args.save_dir)
|
||||
|
||||
if (
|
||||
self.pretrain_args.eval_interval
|
||||
and self.completed_steps % self.pretrain_args.eval_interval == 0
|
||||
and self.eval_dataloader is not None
|
||||
):
|
||||
self._eval(eval_dataloader=self.eval_dataloader, completed_steps=self.completed_steps)
|
||||
|
||||
start_time = time.time()
|
||||
|
||||
if self.completed_steps >= self.pretrain_args.num_training_steps:
|
||||
break
|
||||
|
||||
self.accelerator.end_training()
|
||||
self.accelerator.wait_for_everyone()
|
||||
|
||||
self._post_training()
|
||||
|
||||
def _post_init(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _init_trackers(self):
|
||||
experiment_config = {}
|
||||
experiment_config.update(dataclasses.asdict(self.pretrain_args))
|
||||
self.accelerator.init_trackers(self.pretrain_args.project_name, experiment_config)
|
||||
|
||||
def _get_gradient_accumulation_steps(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _get_batch_loss_avg(self, batch_loss_sum):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _get_lr(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _train_step(self, batch):
|
||||
self.model.train()
|
||||
with self.accelerator.accumulate(self.model):
|
||||
outputs = self.model(**batch)
|
||||
loss = outputs.loss
|
||||
self.accelerator.backward(loss)
|
||||
self.optimizer.step()
|
||||
self.lr_scheduler.step()
|
||||
self.optimizer.zero_grad()
|
||||
return outputs
|
||||
|
||||
def _train_step_log(self, loss, lr, elapsed_time, step):
|
||||
log_str = (
|
||||
f"step: {step} | elapsed time per iteration (ms): {elapsed_time:.1f} | learning rate: {lr:.3E} | "
|
||||
f"lm loss: {loss:.6E}"
|
||||
)
|
||||
print_in_last_rank(log_str)
|
||||
# tracker
|
||||
self.accelerator.log(
|
||||
{
|
||||
"train_loss": loss,
|
||||
"learning_rate": lr,
|
||||
},
|
||||
step=step,
|
||||
)
|
||||
|
||||
def _print_training_info(self):
|
||||
print_in_main_process("***** Running training *****")
|
||||
print_in_main_process(
|
||||
f" Num examples = {self.pretrain_args.num_training_steps * self.pretrain_args.batch_size}"
|
||||
)
|
||||
print_in_main_process(f" Instantaneous batch size per device = {self.pretrain_args.micro_batch_size}")
|
||||
print_in_main_process(
|
||||
f" Total train batch size (w. parallel, distributed & accumulation) = {self.pretrain_args.batch_size}"
|
||||
)
|
||||
print_in_main_process(f" Gradient Accumulation steps = {self._get_gradient_accumulation_steps()}")
|
||||
print_in_main_process(f" Total steps = {self.pretrain_args.num_training_steps}")
|
||||
|
||||
def _pre_training(self):
|
||||
self._print_training_info()
|
||||
print_in_main_process(f"[before the start of training step] datetime: {time.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
self.completed_steps = 0
|
||||
|
||||
def _post_training(self):
|
||||
print_in_main_process(f"[after training is done] datetime: {time.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
self._save(save_dir=self.pretrain_args.save_dir)
|
||||
|
||||
def _get_eval_loss(self, loss):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _eval(self, eval_dataloader, completed_steps=None):
|
||||
if completed_steps is not None:
|
||||
self.completed_steps = completed_steps
|
||||
|
||||
losses = []
|
||||
for _, batch in enumerate(eval_dataloader):
|
||||
outputs = self._eval_step(batch)
|
||||
loss = outputs.loss
|
||||
losses.append(self._get_eval_loss(loss))
|
||||
|
||||
self._eval_log(losses=losses)
|
||||
|
||||
def _eval_step(self, batch):
|
||||
self.model.eval()
|
||||
with torch.no_grad():
|
||||
outputs = self.model(**batch)
|
||||
return outputs
|
||||
|
||||
def _handle_eval_losses(self, losses):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _eval_log(self, losses):
|
||||
losses = self._handle_eval_losses(losses)
|
||||
eval_loss = torch.mean(losses)
|
||||
print_in_last_rank(f"validation at step: {self.completed_steps} | eval_loss: {eval_loss}")
|
||||
self.accelerator.log(
|
||||
{
|
||||
"eval_loss": eval_loss,
|
||||
},
|
||||
step=self.completed_steps,
|
||||
)
|
||||
|
||||
def _save_state(self, save_dir):
|
||||
self.accelerator.save_state(save_dir)
|
||||
|
||||
def _save(self, save_dir):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _read_model(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _prepare(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
def _make_accelerator(self):
|
||||
raise NotImplementedError("_PreTrainerCommon : Not implemented!")
|
||||
|
||||
|
||||
class _PreTrainerMegatron(_PreTrainerCommon):
|
||||
def _make_megatron_dataloader(self):
|
||||
from accelerate.utils import MegatronLMDummyDataLoader
|
||||
|
||||
data_path = self.pretrain_args.data_path
|
||||
megatron_dataloader_config = {
|
||||
"data_path": data_path if isinstance(data_path, list) else [data_path],
|
||||
"seq_length": self.pretrain_args.seq_length,
|
||||
"micro_batch_size": self.pretrain_args.micro_batch_size,
|
||||
"eval_interval": self.pretrain_args.eval_interval,
|
||||
}
|
||||
if self.pretrain_args.dataloader_config:
|
||||
for key, value in self.pretrain_args.dataloader_config.items():
|
||||
if key in megatron_dataloader_config.keys():
|
||||
print_in_main_process(
|
||||
f"PreTrainerMegatron dataloader overriding arguments for "
|
||||
f"{key}:{megatron_dataloader_config[key]} with {key}:{value}"
|
||||
)
|
||||
megatron_dataloader_config[key] = value
|
||||
megatron_dataloader = MegatronLMDummyDataLoader(**megatron_dataloader_config)
|
||||
self.train_dataloader = megatron_dataloader
|
||||
self.accelerator.state.megatron_lm_plugin.megatron_dataset_flag = True
|
||||
|
||||
def _get_megatron_lm_plugin(self):
|
||||
from accelerate.utils import MegatronLMPlugin
|
||||
|
||||
plugin_args = {
|
||||
"train_iters": self.pretrain_args.num_training_steps,
|
||||
"seq_length": self.pretrain_args.seq_length,
|
||||
"num_micro_batches": self.pretrain_args.gradient_accumulation_steps,
|
||||
"megatron_dataset_flag": self.pretrain_args.megatron_dataset_flag,
|
||||
"eval_interval": self.pretrain_args.eval_interval,
|
||||
}
|
||||
if self.pretrain_args.plugin_args:
|
||||
for key, value in self.pretrain_args.plugin_args.items():
|
||||
if key in plugin_args.keys():
|
||||
msg = (
|
||||
f"WARNING: PreTrainerMegatron plugin overriding arguments for "
|
||||
f"{key}:{plugin_args[key]} with {key}:{value}"
|
||||
)
|
||||
print_in_main_process(msg)
|
||||
plugin_args[key] = value
|
||||
|
||||
return MegatronLMPlugin(**plugin_args)
|
||||
|
||||
def _make_accelerator(self):
|
||||
accelerate_kwargs = {
|
||||
"log_with": self.pretrain_args.report_to,
|
||||
"project_dir": self.pretrain_args.save_dir,
|
||||
"mixed_precision": self.pretrain_args.get_mixed_precision(),
|
||||
}
|
||||
megatron_lm_plugin = self._get_megatron_lm_plugin()
|
||||
accelerate_kwargs["megatron_lm_plugin"] = megatron_lm_plugin
|
||||
self.accelerator = Accelerator(**accelerate_kwargs)
|
||||
|
||||
def _post_init(self):
|
||||
if importlib.util.find_spec("megatron") is None or importlib.util.find_spec("megatron.data") is None:
|
||||
raise EnvironmentError("You must use '--no-use-pep517' to pip install nvidia's megatron from source.")
|
||||
if importlib.util.find_spec("openmind_accelerate") is None:
|
||||
raise EnvironmentError("You must pip install openmind_accelerate.")
|
||||
import openmind_accelerate # noqa:F401
|
||||
|
||||
if self.accelerator is None:
|
||||
self._make_accelerator()
|
||||
|
||||
if self.accelerator.gradient_accumulation_steps != 1:
|
||||
raise ValueError(
|
||||
"When using Megatron, gradient accumulation is done in Megatron, "
|
||||
"so gradient_accumulation_steps in Accelerator needs to be set to 1."
|
||||
)
|
||||
|
||||
if self.train_dataloader is None:
|
||||
if not self.pretrain_args.data_path:
|
||||
raise ValueError("`PreTrainer` requires either a `train_dataloader` or `args.data_path` argument")
|
||||
self._make_megatron_dataloader()
|
||||
|
||||
self.accelerator.state.megatron_lm_plugin.megatron_lm_default_args["train_iters"] = (
|
||||
self.pretrain_args.num_training_steps
|
||||
)
|
||||
|
||||
if self.model is None:
|
||||
if not self.pretrain_args.openmind_model_path:
|
||||
raise ValueError("`PreTrainer` requires either a `model` or `args.openmind_model_path` argument")
|
||||
self._read_model()
|
||||
|
||||
self._prepare()
|
||||
self._init_trackers()
|
||||
|
||||
def _pre_training(self):
|
||||
from megatron import get_args
|
||||
|
||||
super()._pre_training()
|
||||
args = get_args()
|
||||
self.model.iteration = args.iteration
|
||||
self.completed_steps = args.iteration
|
||||
|
||||
def _eval(self, eval_dataloader, completed_steps=None):
|
||||
from megatron import get_args
|
||||
|
||||
if completed_steps is not None:
|
||||
self.completed_steps = completed_steps
|
||||
|
||||
args = get_args()
|
||||
losses = []
|
||||
iteration = 0
|
||||
for _, batch in enumerate(eval_dataloader):
|
||||
outputs = self._eval_step(batch)
|
||||
loss = outputs.loss
|
||||
losses.append(self._get_eval_loss(loss))
|
||||
iteration += 1
|
||||
if iteration >= args.eval_iters:
|
||||
break
|
||||
self._eval_log(losses=losses)
|
||||
|
||||
def _get_gradient_accumulation_steps(self):
|
||||
return self.accelerator.state.megatron_lm_plugin.num_micro_batches
|
||||
|
||||
def _get_batch_loss_avg(self, batch_loss_sum):
|
||||
return batch_loss_sum
|
||||
|
||||
def _get_lr(self):
|
||||
return self.lr_scheduler.get_lr()
|
||||
|
||||
def _get_eval_loss(self, loss):
|
||||
return loss
|
||||
|
||||
def _handle_eval_losses(self, losses):
|
||||
return torch.tensor(losses)
|
||||
|
||||
def _save(self, save_dir):
|
||||
self.accelerator.save_state(save_dir)
|
||||
|
||||
def _read_model(self):
|
||||
model_config = AutoConfig.from_pretrained(self.pretrain_args.openmind_model_path)
|
||||
with init_empty_weights():
|
||||
self.model = AutoModelForCausalLM.from_config(model_config)
|
||||
self.model.config.use_cache = False
|
||||
|
||||
def _prepare(self):
|
||||
from accelerate.utils import MegatronLMOptimizerWrapper, MegatronLMSchedulerWrapper
|
||||
|
||||
self.model, self.train_dataloader, self.eval_dataloader = self.accelerator.prepare(
|
||||
self.model, self.train_dataloader, self.train_dataloader
|
||||
)
|
||||
self.optimizer = MegatronLMOptimizerWrapper(self.model.optimizer)
|
||||
self.lr_scheduler = MegatronLMSchedulerWrapper(self.model.scheduler, self.model.optimizer)
|
||||
|
||||
|
||||
class _PreTrainerOther(_PreTrainerCommon):
|
||||
def _make_accelerator(self):
|
||||
accelerate_kwargs = {
|
||||
"log_with": self.pretrain_args.report_to,
|
||||
"project_dir": self.pretrain_args.save_dir,
|
||||
"mixed_precision": self.pretrain_args.get_mixed_precision(),
|
||||
}
|
||||
self.accelerator = Accelerator(**accelerate_kwargs)
|
||||
|
||||
def _post_init(self):
|
||||
if self.accelerator is None:
|
||||
self._make_accelerator()
|
||||
|
||||
if self.train_dataloader is None:
|
||||
raise ValueError("When not using Megatron, `PreTrainer` requires `train_dataloader`")
|
||||
if self.optimizer is None:
|
||||
raise ValueError("When not using Megatron, `PreTrainer` requires `optimizer`")
|
||||
if self.lr_scheduler is None:
|
||||
raise ValueError("When not using Megatron, `PreTrainer` requires `lr_scheduler`")
|
||||
|
||||
if self.model is None:
|
||||
if not self.pretrain_args.openmind_model_path:
|
||||
raise ValueError("`PreTrainer` requires either a `model` or `args.openmind_model_path` argument")
|
||||
self._read_model()
|
||||
|
||||
self._prepare()
|
||||
self._init_trackers()
|
||||
|
||||
def _get_gradient_accumulation_steps(self):
|
||||
return self.accelerator.gradient_accumulation_steps
|
||||
|
||||
def _get_batch_loss_avg(self, batch_loss_sum):
|
||||
return batch_loss_sum / self._get_gradient_accumulation_steps()
|
||||
|
||||
def _get_lr(self):
|
||||
return self.lr_scheduler.get_last_lr()[0]
|
||||
|
||||
def _get_eval_loss(self, loss):
|
||||
return self.accelerator.gather_for_metrics(loss.repeat(self.pretrain_args.batch_size))
|
||||
|
||||
def _handle_eval_losses(self, losses):
|
||||
return torch.cat(losses)
|
||||
|
||||
def _save(self, save_dir):
|
||||
unwrapped_model = self.accelerator.unwrap_model(self.model)
|
||||
unwrapped_model.save_pretrained(
|
||||
save_dir, is_main_process=self.accelerator.is_main_process, save_function=self.accelerator.save
|
||||
)
|
||||
|
||||
def _read_model(self):
|
||||
self.model = AutoModelForCausalLM.from_pretrained(
|
||||
self.pretrain_args.openmind_model_path,
|
||||
torch_dtype=self.pretrain_args.get_torch_dtype(),
|
||||
)
|
||||
self.model.gradient_checkpointing_enable()
|
||||
self.model.config.use_cache = False
|
||||
|
||||
def _prepare(self):
|
||||
if self.eval_dataloader:
|
||||
(
|
||||
self.model,
|
||||
self.train_dataloader,
|
||||
self.eval_dataloader,
|
||||
self.optimizer,
|
||||
self.lr_scheduler,
|
||||
) = self.accelerator.prepare(
|
||||
self.model, self.train_dataloader, self.eval_dataloader, self.optimizer, self.lr_scheduler
|
||||
)
|
||||
else:
|
||||
self.model, self.train_dataloader, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
|
||||
self.model, self.train_dataloader, self.optimizer, self.lr_scheduler
|
||||
)
|
||||
|
||||
|
||||
class PreTrainer(_PreTrainerCommon):
|
||||
def __new__(cls, *args, **kwargs):
|
||||
if os.environ.get("ACCELERATE_USE_MEGATRON_LM", "false") == "true":
|
||||
return _PreTrainerMegatron(*args, **kwargs)
|
||||
return _PreTrainerOther(*args, **kwargs)
|
@ -1,39 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
import logging
|
||||
import os
|
||||
|
||||
import torch
|
||||
|
||||
from openmind.utils.logging import get_logger
|
||||
|
||||
openmind_logger = get_logger(__name__)
|
||||
openmind_logger.setLevel(logging.INFO)
|
||||
|
||||
|
||||
def print_in_main_process(msg):
|
||||
local_rank = int(os.environ.get("LOCAL_RANK", -1))
|
||||
if local_rank in [0, -1]:
|
||||
openmind_logger.info(msg)
|
||||
|
||||
|
||||
def is_last_rank():
|
||||
return torch.distributed.get_rank() == (torch.distributed.get_world_size() - 1)
|
||||
|
||||
|
||||
def print_in_last_rank(msg):
|
||||
if torch.distributed.is_initialized():
|
||||
if is_last_rank():
|
||||
openmind_logger.info(msg)
|
||||
else:
|
||||
openmind_logger.info(msg)
|
@ -1,115 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
|
||||
import dataclasses
|
||||
from dataclasses import dataclass, field
|
||||
import re
|
||||
import warnings
|
||||
|
||||
import torch
|
||||
import yaml
|
||||
|
||||
from .pretrainer_utils import print_in_main_process
|
||||
|
||||
|
||||
warnings.warn(
|
||||
"The class 'PreTrainingArguments' is deprecated and will be removed in version 1.1.0. ",
|
||||
FutureWarning,
|
||||
)
|
||||
|
||||
|
||||
_dtype_map = {"bf16": torch.bfloat16, "fp16": torch.float16, "fp32": torch.float32}
|
||||
|
||||
|
||||
@dataclass
|
||||
class PreTrainingArguments:
|
||||
num_training_steps: int = field(metadata={"help": "Total number fo steps to train the model."})
|
||||
micro_batch_size: int = field(metadata={"help": "Batch size per model instance."})
|
||||
dp: int = field(metadata={"help": "Degree of Parallelism."})
|
||||
gradient_accumulation_steps: int = field(
|
||||
default=1, metadata={"help": "The number of gradient steps to accumulate before updating the model parameters."}
|
||||
)
|
||||
seq_length: int = field(default=None, metadata={"help": "Maximum sequence length to process."})
|
||||
megatron_dataset_flag: bool = field(
|
||||
default=None, metadata={"help": "Flags for whether or not to use a Megatron type dataset."}
|
||||
)
|
||||
data_path: str = field(default=None, metadata={"help": "Path to the training dataset."})
|
||||
save_dir: str = field(default=None, metadata={"help": "Output directory to save checkpoints to."})
|
||||
save_interval: int = field(default=None, metadata={"help": "Number of iterations between checkpoint saves."})
|
||||
eval_interval: int = field(
|
||||
default=None, metadata={"help": "Interval between running evaluation on validation set."}
|
||||
)
|
||||
openmind_model_path: str = field(default=None, metadata={"help": "The path of the Openmind model to be trained."})
|
||||
dtype: str = field(default="bf16", metadata={"help": "The dtype mode that the model is running on."})
|
||||
plugin_args: dict = field(default=None, metadata={"help": "Parameters related to accelerate plugins."})
|
||||
dataloader_config: dict = field(default=None, metadata={"help": "The parameters of dataloader."})
|
||||
report_to: str = field(default=None, metadata={"help": "Whom will accelerate report the log to."})
|
||||
project_name: str = field(default="accelerate-megatron", metadata={"help": "The name of the project"})
|
||||
|
||||
@staticmethod
|
||||
def from_yaml(config_path: str):
|
||||
with open(config_path, "r") as file:
|
||||
config_data = yaml.safe_load(file)
|
||||
return PreTrainingArguments(**config_data)
|
||||
|
||||
def __post_init__(self):
|
||||
self.batch_size = self.micro_batch_size * self.gradient_accumulation_steps * self.dp
|
||||
if self.data_path is not None and self.megatron_dataset_flag is None:
|
||||
raise ValueError(
|
||||
"Since you filled in data_path in PreTrainArguments, you have to specify the "
|
||||
"megatron_dataset_flag parameter at the same time."
|
||||
)
|
||||
|
||||
self.dtype = self.dtype.lower()
|
||||
if self.dtype not in _dtype_map:
|
||||
raise ValueError(f"Unknown dtype:{self.dtype}. Supported dtypes:{','.join(_dtype_map.keys())}")
|
||||
|
||||
for f in dataclasses.fields(self):
|
||||
value = getattr(self, f.name)
|
||||
if value:
|
||||
if f.type is str:
|
||||
if re.match(r"^[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)$", value):
|
||||
setattr(self, f.name, float(value))
|
||||
print_in_main_process(
|
||||
f"WARNING: PreTrainingArguments transferring the type of {f.name} from str to float!"
|
||||
)
|
||||
if f.type is dict:
|
||||
self._scientific_str_to_float(value)
|
||||
|
||||
def get_mixed_precision(self):
|
||||
if self.dtype == "fp32":
|
||||
return "no"
|
||||
return self.dtype
|
||||
|
||||
def get_torch_dtype(self):
|
||||
return _dtype_map.get(self.dtype)
|
||||
|
||||
def get_distributed_train_args(self):
|
||||
return self.plugin_args.copy()
|
||||
|
||||
def update_distributed_train_args(self, extra_args: dict):
|
||||
self.plugin_args.update(extra_args)
|
||||
|
||||
def get_dataloader_config(self):
|
||||
return self.dataloader_config.copy()
|
||||
|
||||
def _scientific_str_to_float(self, config_dict: dict):
|
||||
for key, value in config_dict.items():
|
||||
if isinstance(value, str):
|
||||
if re.match(r"^[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)$", value):
|
||||
config_dict[key] = float(value)
|
||||
print_in_main_process(
|
||||
f"WARNING: PreTrainingArguments transferring the type of {key} from str to float!"
|
||||
)
|
||||
if isinstance(value, dict):
|
||||
self._scientific_str_to_float(value)
|
@ -1,238 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
import dataclasses
|
||||
from unittest import TestCase, skip
|
||||
from unittest.mock import MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.utils_for_test import require_torch
|
||||
|
||||
|
||||
class TestPreTrainerCommon(TestCase):
|
||||
@pytest.fixture(scope="function", autouse=True)
|
||||
def global_setup(self):
|
||||
from openmind import PreTrainer, PreTrainingArguments
|
||||
|
||||
self.pretrain_args = PreTrainingArguments(
|
||||
num_training_steps=10,
|
||||
micro_batch_size=4,
|
||||
dp=1,
|
||||
gradient_accumulation_steps=8,
|
||||
seq_length=4096,
|
||||
megatron_dataset_flag=True,
|
||||
data_path="llama2-mt_text_document",
|
||||
save_dir="llama-2-7b-hf_save",
|
||||
save_interval=10000,
|
||||
eval_interval=10000,
|
||||
openmind_model_path="hf",
|
||||
dtype="bf16",
|
||||
plugin_args={
|
||||
"tp_degree": 4,
|
||||
"pp_degree": 1,
|
||||
"num_micro_batches": 8,
|
||||
"gradient_clipping": 1.0,
|
||||
"use_distributed_optimizer": False,
|
||||
"sequence_parallelism": False,
|
||||
"other_megatron_args": {
|
||||
"tokenizer_model": "tokenizer.model",
|
||||
"tokenizer_type": "Llama2Tokenizer",
|
||||
"finetune": False,
|
||||
"recompute_granularity": "full",
|
||||
"recompute_method": "block",
|
||||
"recompute_num_layers": 32,
|
||||
"optimizer": "adam",
|
||||
"lr": 1e-5,
|
||||
"min_lr": 1e-6,
|
||||
"adam_beta2": 0.95,
|
||||
"add_bias_linear": False,
|
||||
"async_tensor_model_parallel_allreduce": True,
|
||||
"attention_dropout": 0.0,
|
||||
"attention_softmax_in_fp32": True,
|
||||
"bias_gelu_fusion": False,
|
||||
"ffn_hidden_size": 11008,
|
||||
"hidden_dropout": 0.0,
|
||||
"init_method_std": 0.01,
|
||||
"initial_loss_scale": 65536.0,
|
||||
"lr_decay_style": "cosine",
|
||||
"lr_warmup_fraction": 0.01,
|
||||
"masked_softmax_fusion": False,
|
||||
"normalization": "RMSNorm",
|
||||
"sequence_parallel": True,
|
||||
"split": "100,0,0",
|
||||
"swiglu": True,
|
||||
"untie_embeddings_and_output_weights": True,
|
||||
"use_flash_attn": True,
|
||||
"weight_decay": 0.1,
|
||||
"no_load_optim": True,
|
||||
"no_load_rng": True,
|
||||
"eval_iters": 10000,
|
||||
"position_embedding_type": "rope",
|
||||
},
|
||||
},
|
||||
dataloader_config={
|
||||
"data_path": ["llama2-mt_text_document"],
|
||||
"seq_length": 4096,
|
||||
"micro_batch_size": 4,
|
||||
"split": "100,0,0",
|
||||
"eval_iters": 10000,
|
||||
"tokenizer_model": "tokenizer.model",
|
||||
"tokenizer_type": "Llama2Tokenizer",
|
||||
},
|
||||
)
|
||||
self.pretrainer = PreTrainer
|
||||
self.obj = MagicMock()
|
||||
self.obj.pretrain_args = self.pretrain_args
|
||||
self.obj.accelerate = MagicMock()
|
||||
|
||||
def test_init_trackers(self):
|
||||
self.obj.accelerator.init_trackers = MagicMock()
|
||||
self.pretrainer._init_trackers(self.obj)
|
||||
self.obj.accelerator.init_trackers.assert_called_once_with(
|
||||
self.obj.pretrain_args.project_name, dataclasses.asdict(self.obj.pretrain_args)
|
||||
)
|
||||
|
||||
def test_get_gradient_accumulation_steps(self):
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._get_gradient_accumulation_steps(self.obj)
|
||||
|
||||
def test_get_batch_loss_avg(self):
|
||||
batch_loss_sum = 100.0
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._get_batch_loss_avg(self.obj, batch_loss_sum)
|
||||
|
||||
def test_get_lr(self):
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._get_lr(self.obj)
|
||||
|
||||
def test_train(self):
|
||||
self.obj.completed_steps = 0
|
||||
self.obj.train_dataloader = [MagicMock()]
|
||||
self.obj.eval_dataloader = MagicMock()
|
||||
self.obj._pre_training = MagicMock()
|
||||
self.obj._train_step = MagicMock()
|
||||
self.obj._get_lr = MagicMock(return_value=0.001)
|
||||
self.obj._get_batch_loss_avg = MagicMock(return_value=0.5)
|
||||
self.obj._train_step_log = MagicMock()
|
||||
self.obj._save_state = MagicMock()
|
||||
self.obj._eval = MagicMock()
|
||||
self.obj._post_training = MagicMock()
|
||||
self.obj.accelerate.sync_gradients = True
|
||||
self.pretrainer.train(self.obj)
|
||||
|
||||
self.assertTrue(self.obj._pre_training.called)
|
||||
self.assertTrue(self.obj.accelerator.end_training.called)
|
||||
self.assertTrue(self.obj.accelerator.wait_for_everyone.called)
|
||||
self.assertTrue(self.obj._post_training.called)
|
||||
|
||||
def test_train_step(self):
|
||||
self.obj.model = MagicMock()
|
||||
self.obj.optimizer = MagicMock()
|
||||
self.obj.lr_scheduler = MagicMock()
|
||||
batch = {"input": "data"}
|
||||
|
||||
outputs = self.pretrainer._train_step(self.obj, batch)
|
||||
|
||||
self.obj.model.train.assert_called_once()
|
||||
self.obj.accelerator.accumulate.assert_called_once_with(self.obj.model)
|
||||
self.obj.model.assert_called_once_with(**batch)
|
||||
self.obj.accelerator.backward.assert_called_once_with(outputs.loss)
|
||||
self.obj.optimizer.step.assert_called_once()
|
||||
self.obj.lr_scheduler.step.assert_called_once()
|
||||
self.obj.optimizer.zero_grad.assert_called_once()
|
||||
self.assertEqual(outputs, self.obj.model.return_value)
|
||||
|
||||
def test_train_step_log(self):
|
||||
loss = 0.123
|
||||
lr = 0.001
|
||||
elapsed_time = 10.5
|
||||
step = 100
|
||||
self.pretrainer._train_step_log(self.obj, loss, lr, elapsed_time, step)
|
||||
self.obj.accelerator.log.assert_called_with({"train_loss": loss, "learning_rate": lr}, step=step)
|
||||
|
||||
def test_pre_training(self):
|
||||
self.obj._print_training_info = MagicMock()
|
||||
self.pretrainer._pre_training(self.obj)
|
||||
self.obj._print_training_info.assert_called_once()
|
||||
self.assertEqual(self.obj.completed_steps, 0)
|
||||
|
||||
def test_post_training(self):
|
||||
self.obj._save = MagicMock()
|
||||
self.pretrainer._post_training(self.obj)
|
||||
self.obj._save.assert_called_once_with(save_dir=self.obj.pretrain_args.save_dir)
|
||||
|
||||
def test_get_eval_loss(self):
|
||||
loss = 1.0
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._get_eval_loss(self.obj, loss)
|
||||
|
||||
@skip
|
||||
def test_eval(self):
|
||||
eval_dataloader = [(1, "batch1"), (2, "batch2")]
|
||||
completed_steps = 100
|
||||
|
||||
self.obj._eval_log = MagicMock()
|
||||
self.pretrainer._eval(self.obj, eval_dataloader, completed_steps)
|
||||
self.obj._eval_log.assert_called_once()
|
||||
|
||||
@require_torch
|
||||
def test_eval_step(self):
|
||||
import torch
|
||||
|
||||
batch = {"input_ids": torch.tensor([[1, 2, 3]]), "attention_mask": torch.tensor([[1, 1, 1]])}
|
||||
self.obj.model = MagicMock()
|
||||
|
||||
outputs = self.pretrainer._eval_step(self.obj, batch)
|
||||
self.assertTrue(self.obj.model.eval.called)
|
||||
self.assertIsNotNone(outputs)
|
||||
|
||||
def test_handle_eval_losses(self):
|
||||
losses = [0.1, 0.2]
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._handle_eval_losses(self.obj, losses)
|
||||
|
||||
@require_torch
|
||||
def test_eval_log(self):
|
||||
import torch
|
||||
|
||||
losses = torch.tensor([0.5, 0.3])
|
||||
self.obj._handle_eval_losses = MagicMock(return_value=losses)
|
||||
self.obj.accelerator.log = MagicMock()
|
||||
self.pretrainer._eval_log(self.obj, losses)
|
||||
|
||||
self.assertTrue(self.obj._handle_eval_losses.called)
|
||||
self.assertEqual(self.obj.accelerator.log.call_count, 1)
|
||||
|
||||
def test_save_state(self):
|
||||
save_dir = "/path/to/save"
|
||||
self.obj.accelerator.save_state = MagicMock()
|
||||
self.pretrainer._save_state(self.obj, save_dir)
|
||||
|
||||
self.obj.accelerator.save_state.assert_called_once_with(save_dir)
|
||||
|
||||
def test_save(self):
|
||||
save_dir = "/path/to/save"
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._save(self.obj, save_dir)
|
||||
|
||||
def test_read_model(self):
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._read_model(self.obj)
|
||||
|
||||
def test_prepare(self):
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._prepare(self.obj)
|
||||
|
||||
def test_make_accelerator(self):
|
||||
with self.assertRaises(NotImplementedError):
|
||||
self.pretrainer._make_accelerator(self.obj)
|
@ -1,40 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
|
||||
import os
|
||||
import logging
|
||||
from unittest.mock import patch
|
||||
|
||||
from openmind.utils.logging import get_logger
|
||||
from tests.utils_for_test import require_torch
|
||||
|
||||
openmind_logger = get_logger(__name__)
|
||||
openmind_logger.setLevel(logging.INFO)
|
||||
|
||||
|
||||
def test_print_in_main_process_with_local_rank_0(caplog):
|
||||
caplog.set_level(logging.INFO)
|
||||
with patch.dict(os.environ, {"LOCAL_RANK": "0"}):
|
||||
openmind_logger.info("Test message.")
|
||||
log_msg = [record.message for record in caplog.records]
|
||||
assert "Test message." in log_msg
|
||||
|
||||
|
||||
@require_torch
|
||||
def test_print_in_main_process_with_local_rank_1(capsys):
|
||||
from openmind.archived.trainers.pretrainer_utils import print_in_main_process
|
||||
|
||||
with patch.dict(os.environ, {"LOCAL_RANK": "1"}):
|
||||
print_in_main_process("Test message.")
|
||||
captured = capsys.readouterr()
|
||||
assert captured.out == ""
|
@ -1,68 +0,0 @@
|
||||
# Copyright (c) 2024 Huawei Technologies Co., Ltd.
|
||||
#
|
||||
# openMind is licensed under Mulan PSL v2.
|
||||
# You can use this software according to the terms and conditions of the Mulan PSL v2.
|
||||
# You may obtain a copy of Mulan PSL v2 at:
|
||||
#
|
||||
# http://license.coscl.org.cn/MulanPSL2
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
|
||||
# EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
|
||||
# MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
|
||||
# See the Mulan PSL v2 for more details.
|
||||
|
||||
from unittest import TestCase
|
||||
|
||||
import pytest
|
||||
|
||||
from tests.utils_for_test import require_torch
|
||||
|
||||
|
||||
@require_torch
|
||||
class TestPreTrainingArguments(TestCase):
|
||||
@pytest.fixture(scope="function", autouse=True)
|
||||
def global_setup(self):
|
||||
from openmind import PreTrainingArguments
|
||||
|
||||
self.pretrain_args = PreTrainingArguments(
|
||||
num_training_steps=1000,
|
||||
micro_batch_size=4,
|
||||
dp=1,
|
||||
gradient_accumulation_steps=8,
|
||||
seq_length=2048,
|
||||
megatron_dataset_flag=True,
|
||||
data_path="DATA_PATH",
|
||||
save_dir="SAVE_PATH",
|
||||
save_interval=10000,
|
||||
eval_interval=0,
|
||||
openmind_model_path="BASE_MODEL",
|
||||
plugin_args={"lr": 1.23e-4},
|
||||
dataloader_config={"batch": 20},
|
||||
)
|
||||
|
||||
def test_from_yaml(self):
|
||||
config_path = "CONFIG_PATH"
|
||||
try:
|
||||
self.pretrain_args.from_yaml(config_path)
|
||||
except Exception as exception:
|
||||
self.assertIsInstance(exception, FileNotFoundError)
|
||||
|
||||
def test_get_torch_dtype(self):
|
||||
import torch
|
||||
|
||||
self.assertEqual(self.pretrain_args.get_torch_dtype(), torch.bfloat16)
|
||||
|
||||
def test_get_distributed_train_args(self):
|
||||
self.assertEqual(self.pretrain_args.get_distributed_train_args()["lr"], 1.23e-4)
|
||||
|
||||
def test_update_distributed_train_args(self):
|
||||
self.pretrain_args.update_distributed_train_args({"tp_degree": 4})
|
||||
self.assertEqual(self.pretrain_args.plugin_args["lr"], 1.23e-4)
|
||||
self.assertEqual(self.pretrain_args.plugin_args["tp_degree"], 4)
|
||||
|
||||
def test_get_dataloader_config(self):
|
||||
self.assertEqual(self.pretrain_args.get_dataloader_config()["batch"], 20)
|
||||
|
||||
def test_scientific_str_to_float(self):
|
||||
self.pretrain_args._scientific_str_to_float(self.pretrain_args.plugin_args)
|
||||
self.assertEqual(self.pretrain_args.plugin_args["lr"], 0.000123)
|
Reference in New Issue
Block a user