!225 去除pretrainer相关内容

Merge pull request !225 from humphrey007/master
2025-06-05 07:47:45 +00:00
parent 46bffba06d
commit 0adb659e5c
16 changed files with 78 additions and 2133 deletions
--- a/docs/en/api_reference/apis/pretrainer_api.md
+++ b/docs/en/api_reference/apis/pretrainer_api.md
@ -1,124 +0,0 @@
 # PreTrainer Module APIs
 ## openmind.PreTrainer Class
 The `PreTrainer` class provides common functions for pre-training process management.
 **Parameters**
 | Parameter          | Type                                       | Description           | Default Value |
 | ---------------- | ------------------------------------------- |---------------|------|
 | pretrain_args    | PreTrainingArguments                        | Pre-training parameter       | -    |
 | accelerator      | Accelerator                                 | Accelerate instance| None |
 | model            | torch.nn.Module                             | Torch model     | None |
 | optimizer        | accelerate.utils.MegatronLMOptimizerWrapper | Optimizer         | None |
 | lr_scheduler     | accelerate.utils.MegatronLMSchedulerWrapper | Scheduler         | None |
 | train_dataloader | torch.utils.data.DataLoader                 | Training data loader     | None |
 | eval_dataloader  | torch.utils.data.DataLoader                 | Evaluation data loader     | None |
 ### train
 Starts pre-training.
 **Prototype**
 ```python
 def train()
 ```
 ## openmind.PreTrainingArguments Class
 The `PreTrainingArguments` class configures parameters of a training job, including hyperparameters required during training, model save path, and learning rate.
 **Parameters**
 | Parameter                     | Type| Description               | Default Value for PyTorch           |
 | --------------------------- | ---- |-------------------|-----------------------|
 | num_training_steps          | int  | Number of training steps            | -                     |
 | micro_batch_size            | int  | Size of a micro batch            | -                     |
 | dp                          | int  | Degree of parallelism             | -                     |
 | gradient_accumulation_steps | int  | Number of gradient accumulation steps          | 1                     |
 | seq_length                  | int  | Maximum length of a sequence        | None                  |
 | megatron_dataset_flag       | bool | Whether the dataset is Magatron-formatted| None                  |
 | data_path                   | str  | Dataset path           | None                  |
 | save_dir                    | str  | Model saving path          | None                  |
 | save_interval               | int  | Model saving interval          | None                  |
 | eval_interval               | int  | Model evaluation interval          | None                  |
 | openmind_model_path         | str  | Model path            | None                  |
 | dtype                       | str  | Runtime data type         | bf16                  |
 | plugin_args                 | dict | [Accelerate plugin parameter](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin)  | None                  |
 | dataloader_config           | dict | [Loader configuration parameter](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader)         | None                  |
 | report_to                   | str  | Accelerate log output object| None                  |
 | project_name                | str  | Project name           | "accelerate-megatron" |
 ### from_yaml
 Loads configurations from the YAML configuration file.
 **Prototype**
 ```python
 def from_yaml(config_path: str)
 ```
 **Parameters**
 | Parameter     | Description         | Supported Type|
 | ----------- |-------------| -------- |
 | config_path | Path of the YAML configuration file| str      |
 ### get_mixed_precision
 Obtains the mixed precision type.
 **Prototype**
 ```python
 def get_mixed_precision()
 ```
 ### get_torch_dtype
 Obtains the runtime data type.
 **Prototype**
 ```python
 def get_torch_dtype()
 ```
 ### get_distributed_train_args
 Obtains distributed pre-training parameters.
 **Prototype**
 ```python
 def get_distributed_train_args()
 ```
 ### update_distributed_train_args
 Updates distributed pre-training parameters.
 **Prototype**
 ```python
 def update_distributed_train_args(extra_args: dict)
 ```
 **Parameters**
 | Parameter    | Description         | Supported Type|
 | ---------- |-------------| -------- |
 | extra_args | Additional parameter for distributed pre-training| dict     |
 ### get_dataloader_config
 Obtains the configuration parameters of the data loader.
 **Prototype**
 ```python
 def get_dataloader_config()
 ```
--- a/docs/en/basic_tutorial/pretrainer.md
+++ b/docs/en/basic_tutorial/pretrainer.md
@ -1,450 +0,0 @@
 # Model Pre-training
 ## Basic Concepts
 **Pre-training** is a training policy for deep learning models, which is usually performed on a large-scale dataset. The goal of pre-training is to train the model on a related but large task so that the model learns general features and representations. However, with the rapid growth of large model parameters and the amount of training data required, the resource upper limit of a single machine can no longer meet the training requirements, so the concept of distributed training is introduced.
 **Distributed training** means that a deep learning model task is divided into a plurality of subtasks, and training is performed in parallel on multiple computing devices. Distributed training greatly improves the training speed of large models and greatly reduces the overall model training time.
 In this document, PreTrainer implements distributed capabilities of multiple frameworks (Megatron, DeepSpeed, and FSDP) based on Accelerate and provides common functions for pre-training process management.
 ## Environment Setup
 ```shell
 torch: 2.1.0
 transformers: 4.45.2
 accelerate: 0.28.0
 deepspeed: 0.15.2
 megatron_core: 0.4.0rc0
 ```
 ### Installing the Megatron-LM Distributed Framework
 To use the Megatron-LM distributed framework, perform the following steps:
 1. Install Megatron. For details, see the [Megatron installation method of MindSpeed](https://gitee.com/ascend/MindSpeed#3-obtain-megatron-lm-and-specify-commit-id.)
   ```shell
   git clone https://github.com/NVIDIA/Megatron-LM.git
   cd Megatron-LM
   git checkout bcce6f54e075e3c3374ea67adefe54f3f2da2b07
   pip install --no-use-pep517 -e .  # "--no-use-pep517 -e" can install all Megatron files.
   ```
 2. Install MindSpeed.
   ```shell
   git clone  https://gitee.com/ascend/MindSpeed.git
   cd MindSpeed
   git checkout origin/1.0.RC1
   pip install -r requirements.txt
   pip install -e .
   ```
 3. Use pip to install the openmind_accelerate plugin of the Modelers community.
   ```shell
   #AArch64 platform
   pip install openmind-accelerate
   #x86 platform
   pip install openmind-accelerate --extra-index-url https://download.pytorch.org/whl/cpu 
   ```
 4. Install Accelerate and DeepSpeed.
   ```shell
   pip install deepspeed==0.15.2
   pip install accelerate==0.28.0
   ```
 ### openMind Library Environment Setup
 ```shell
 #Installation in the AArch64 environment
 pip install openmind[pt] 
 #Installation in the x86 environment
 pip install openmind[pt] --extra-index-url https://download.pytorch.org/whl/cpu 
 ```
 For details about how to install the openMind Library dependency environment, see [openMind Library Installation Guide](../install.md).
 After the installation is complete, use `pip list` to check the version dependency. If the Accelerate or Transformers version is updated during the installation, update them to the specified version.
 ## Quick Start
 [Sample configuration files and startup scripts](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples) are provided for easy access.
 ### PreTrainer Use Procedure
 #### Preparing Dataset
 Prepare your own pre-training dataset, for example, [alpaca_en](https://modelers.cn/datasets/HaM/alpaca_en/tree/main) dataset.
 If you need to use the Megatron-LM distributed framework, see [Megatron Data Processing](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#data-preprocessing).
 #### Preparing a Model
 Prepare a model file, for example, [Llama 2](https://modelers.cn/models/AI_Connect/llama2_7b/tree/main).
 If you want to use the Megatron-LM distributed framework, you only need to prepare the **config.json** and **tokenizer** files.
 #### Preparing Pre-training Parameters
 The pre-training parameters can be automatically generated by loading the [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml) file. You can fine-tune the sample configuration file of the dataset in JSON format by referring to [here] (#llama2_megatron).
 #### Startup
 - For details about the Accelerate configuration file, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml).
   ```yaml
   compute_environment: LOCAL_MACHINE
   debug: false
   distributed_type: MEGATRON_LM
   downcast_bf16: 'no'
   machine_rank: 0
   main_training_function: main
   num_machines: 1
   num_processes: 8
   rdzv_backend: static
   same_network: true
   tpu_env: [ ]
   tpu_use_cluster: false
   tpu_use_sudo: false
   use_cpu: false
   ```
 - For details about the model configuration file, see [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml).
    <a id="llama2_megatron"></a>
    ```yaml
    num_training_steps: 1000
    micro_batch_size: &micro_batch_size 4
    dp: 1
    gradient_accumulation_steps: &gradient_accumulation_steps 8
    ### The value of **seq_length** must be less than or equal to the value of **max_position_embeddings** in the model weight configuration file **config.json**.
    seq_length: &seq_length 4096
    megatron_dataset_flag: False
    ### data_path: Enter the path of the local fine-tuning dataset.
    data_path: &data_path '/path/to/alpaca_en/alpaca_data_en_52k.json'
    ### Path for saving the fine-tuning model weight
    save_dir: './saves'
    save_interval: 10000
    eval_interval: 10000
    ### openmind_model_path: Enter the path of the local model weight folder.
    openmind_model_path: '/path/to/llama2-7b-hf'
    dtype: 'bf16'
    plugin_args:
      tp_degree: 8
      pp_degree: 1
      num_micro_batches: *gradient_accumulation_steps
      gradient_clipping: 1.0
      use_distributed_optimizer: False
      sequence_parallelism: False
      other_megatron_args:
        ### tokenizer_model: path of the tokenizer.model file in the local model weight file.
        tokenizer_model: &tokenizer_model '/path/to/llama2-7b-hf/tokenizer.model'
        tokenizer_type: &tokenizer_type 'Llama2Tokenizer'
        finetune: False
        recompute_granularity: "full"
        recompute_method: "block"
        recompute_num_layers: 32
        optimizer: "adam"
        lr: 1e-5
        min_lr: 1e-6
        adam_beta2: 0.95
        add_bias_linear: False
        async_tensor_model_parallel_allreduce: False
        attention_dropout: 0.0
        attention_softmax_in_fp32: False
        bias_gelu_fusion: False
        ffn_hidden_size: 11008
        hidden_dropout: 0.0
        init_method_std: 0.01
        initial_loss_scale: 65536.0
        lr_decay_style: "cosine"
        lr_warmup_fraction: 0.01
        masked_softmax_fusion: False
        normalization: "RMSNorm"
        split: &split "100,0,0"
        swiglu: True
        untie_embeddings_and_output_weights: True
        use_flash_attn: False
        weight_decay: 0.1
        no_load_optim: True
        no_load_rng: True
        eval_iters: &eval_iters 10
        position_embedding_type: "rope"
    dataloader_config:
      return_tensors: 'pt'
      padding: 'max_length'
      pad_to_multiple_of: *seq_length
      max_length: *seq_length
    ```
 - For details about the pre-training program file, see [train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py). This Python script cannot be directly run. To run it, download the following repository to obtain the utils code and copy **accelerate_examples/examples/utils** to the same directory as the script.
    ```shell
    git clone https://modelers.cn/AI-Research/accelerate_examples.git
    cp -r accelerate_examples/examples/utils ./ #: Replace the destination path with the path of the train_with_megatron_json_dataset.py file.
    ```
    ```python
    import os
    import openmind_accelerate
    from openmind import PreTrainingArguments, PreTrainer
    from utils.config import get_pretrain_config_file
    from utils.accelerator import make_accelerator
    from utils.data import make_train_and_eval_dataloader
    from utils.tokenizer import get_tokenizer
    pretrain_args = PreTrainingArguments.from_yaml(get_pretrain_config_file())
    os.makedirs(pretrain_args.save_dir, exist_ok=True)
    accelerator = make_accelerator(pretrain_args=pretrain_args)
    tokenizer = get_tokenizer(tokenizer_path=pretrain_args.openmind_model_path, use_fast=False)
    transformer_dataloader_config = pretrain_args.get_dataloader_config()
    train_dataloader, eval_dataloader = make_train_and_eval_dataloader(
        dataloader_config=transformer_dataloader_config,
        micro_batch_size=pretrain_args.micro_batch_size,
        data_files=pretrain_args.data_path,
        max_length=pretrain_args.seq_length,
        tokenizer=tokenizer,
        accelerator=accelerator
    )
    pretrainer = PreTrainer(pretrain_args=pretrain_args,
                            train_dataloader=train_dataloader,
                            eval_dataloader=eval_dataloader,
                            )
    pretrainer.train()
    ```
 After configuring the environment configuration and preparing the configuration file, run the following command to start fine-tuning. Ensure that the training script and configuration file are in the actual local path.
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
 ```
 ## Advanced Use
 ### Defining Pre-training Parameters
 Before defining PreTrainer, you need to define a PreTrainingArguments class that contains all hyperparameters used by PreTrainer for training and evaluation. You can initialize the pre-training parameters by using the configuration file or directly transferring parameters.
 #### Using the Configuration File
 The pre-training parameters can be automatically generated by loading the YAML file. For more YAML examples, see [Samples Link](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples/llama2_config).
 ```python
 from openmind import PreTrainingArguments
 # Replace the path with a local path.
 pretrain_args = PreTrainingArguments.from_yaml(
    "openmind-accelerate/examples/llama2_config/llama2-megatron.yaml"
 )
 ```
 #### Directly Passing Parameters
 Pre-training parameters can also be instantiated through parameter pass. The initialization process of the pre-trainer for training the Megatron dataset using the Megatron model is as follows.
 For details, see [PreTrainingArguments Description] (#pretrainingarguments Description).
 ```python
 from openmind import PreTrainingArguments
 # Replace the path with a local path.
 pretrain_args = PreTrainingArguments(
    megatron_dataset_flag=True,
    data_path="HaM/alpaca_en",
    num_training_steps=1000,
    micro_batch_size=4,
    dp=1,
    gradient_accumulation_steps=8,
    seq_length=2048,
 )
 ```
 ### Pre-training a Model Using the Megatron Framework
 After configuring the pre-training parameters, you can start the Megatron model pre-training.
 - For details about the configuration file for Accelerate and Megatron interconnection, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml).
 - For details about how to use the Megatron framework to train the JSON dataset, see [train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py).
 - For details about the configuration file of JSON pre-training dataset, see [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml).
 You only need to pass the prepared `train_dataloader` (`eval_dataloader` not necessarily required) to PreTrainer. Then, you can use the custom dataloader to pre-train the model.
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
 ```
 #### (Optional) Customizing the Processing Flow of the Megatron Framework
 ##### Customizing Functions
 When using Megatron for pre-training, you can customize any function in datasets_provider, model_provider, get_batch, and loss_function and assign the function pointer to the following attributes. For details about how to implement user-defined functions, see the official sample [pretrain_gpt.py](https://github.com/NVIDIA/Megatron-LM/blob/main/pretrain_gpt.py).
 - `custom_megatron_datasets_provider_function`: provides the training and validation datasets of Megatron.
 - `custom_get_batch_function`: generates batch data.
 - `custom_model_provider_function`: builds models.
 - `custom_loss_function`: returns the loss function.
 ```python
 import openmind_accelerate
 from openmind import PreTrainingArguments
 from pretrain_gpt import (
    train_valid_test_datasets_provider,
    get_batch as megatron_gpt_get_batch,
    model_provider as megatron_gpt_model_provider,
    loss_func as megatron_gpt_loss_func,
 )
 # Replace the path with a local path.
 pretrain_args = PreTrainingArguments.from_yaml(
    "openmind-accelerate/examples/llama2_config/llama2-megatron-json-dataset.yaml"
 )
 train_valid_test_datasets_provider.is_distributed = True
 pretrain_args.update_distributed_train_args(
    extra_args={
        "custom_megatron_datasets_provider_function": train_valid_test_datasets_provider,
        "custom_get_batch_function": megatron_gpt_get_batch,
        "custom_model_provider_function": megatron_gpt_model_provider,
        "custom_loss_function": megatron_gpt_loss_func,
    }
 )
 ```
 ##### Customizing Analytical Model Configuration File
 You can customize the analytical function of the model configuration file based on the format configured for the Accelerate analytical model. The following is the built-in analytical function of the Llama model configuration file in PreTrainer. You can refer to the function as needed.
 ```python
 import openmind_accelerate
 from accelerate.utils import add_model_config_to_megatron_parser
@add_model_config_to_megatron_parser("llama")
 def parse_llama_config(megatron_lm_plugin, model, batch_data):
    model_type_name = "gpt"
    num_layers = model.config.num_hidden_layers
    pretraining_flag = True
    hidden_size = model.config.hidden_size
    num_attention_heads = model.config.num_attention_heads
    orig_vocab_size = model.config.vocab_size
    max_position_embeddings = getattr(model.config, "max_position_embeddings")
    seq_length = getattr(model.config, "max_sequence_length", None)
    if megatron_lm_plugin.seq_length is None:
        if seq_length is not None:
            megatron_lm_plugin.seq_length = seq_length
        elif megatron_lm_plugin.decoder_seq_length is not None:
            megatron_lm_plugin.seq_length = megatron_lm_plugin.decoder_seq_length
        elif batch_data is not None:
            megatron_lm_plugin.seq_length = batch_data["input_ids"].shape[1]
        else:
            megatron_lm_plugin.seq_length = max_position_embeddings
    megatron_lm_plugin.megatron_lm_default_args["return_logits"] = megatron_lm_plugin.return_logits
    megatron_lm_plugin.megatron_lm_default_args["tokenizer_type"] = "Llama2Tokenizer"
    megatron_lm_plugin.megatron_lm_default_args["model_type_name"] = model_type_name
    megatron_lm_plugin.megatron_lm_default_args["num_layers"] = num_layers
    megatron_lm_plugin.megatron_lm_default_args["pretraining_flag"] = pretraining_flag
    megatron_lm_plugin.megatron_lm_default_args["hidden_size"] = hidden_size
    megatron_lm_plugin.megatron_lm_default_args["num_attention_heads"] = num_attention_heads
    megatron_lm_plugin.megatron_lm_default_args["orig_vocab_size"] = orig_vocab_size
    megatron_lm_plugin.megatron_lm_default_args["max_position_embeddings"] = max_position_embeddings
    megatron_lm_plugin.megatron_lm_default_args["seq_length"] = megatron_lm_plugin.seq_length
    megatron_lm_plugin.megatron_lm_default_args["model_return_dict"] = model.config.return_dict
 ```
 ### Using Other Frameworks to Pre-train Models
 PreTrainer can implement a multi-framework distributed capability based on Accelerate. In addition to Megatron, PreTrainer also supports the DeepSpeed and FSDP distributed frameworks. The following uses DeepSpeed as an example.
 After configuring the JSON pre-training parameters, you can start the DeepSpeed model pre-training.
 - For details about the configuration file for Accelerate and DeepSpeed interconnection, see [accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_deepspeed_config.yaml).
 - For details about how to use the DeepSpeed framework to train the JSON dataset, see [train_with_deepspeed.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_deepspeed.py).
 - For details about the configuration file of JSON pre-training dataset, see [llama2_config/llama2-deepspeed.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-deepspeed.yaml).
 ```yaml
 num_training_steps: 1000
 micro_batch_size: 1
 dp: 8
 gradient_accumulation_steps: 8
 seq_length: 4096
 megatron_dataset_flag: False
 data_path: '/path/to/alpaca_en/alpaca_data_en_52k.json'
 save_dir: './saves'
 save_interval: 10000
 eval_interval: 10000
 openmind_model_path: '/path/to/llama2-7b-hf'
 dtype: 'bf16'
 dataloader_config:
  return_tensors: 'pt'
  padding: 'max_length'
  pad_to_multiple_of: 4096
  max_length: 4096
 ### The value of **seq_length**, **max_length**, and **padding** must be less than or equal to the value of **max_position_embeddings** in the model weight configuration file **config.json**.
 ```
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_deepspeed_config.yaml train_with_deepspeed.py --pretrain_config_file llama2_config/llama2-deepspeed.yaml
 ```
 ## PreTrainingArguments Description
 | **Name**                    | **Description**               | **Type**| **Default Value**| Mandatory/Optional |
 |-----------------------------|-----------------------|--------|---------|---------|
 | num_training_steps          | Total number of steps for training a model.             | int    | -       | Mandatory |
 | micro_batch_size            | Batch size of each model instance.         | int    | -       | Mandatory     |
 | dp                          | Data parallelism                  | int    | -       | Mandatory     |
 | gradient_accumulation_steps | Number of gradient steps to be accumulated before model parameters are updated.    | int    | 1       | Optional     |
 | seq_length                  | Maximum length of the sequence to be processed.           | int    | None    | Optional  |
 | megatron_dataset_flag       | Whether to use a flag of the Megatron dataset. | bool   | None    | Optional  |
 | data_path                   | Training dataset path.             | str    | None    | Optional  |
 | save_dir                    | Output directory to which the checkpoint is to be saved.        | str    | None    | Optional  |
 | save_interval               | Iteration interval for saving checkpoints.         | int    | None    | Optional  |
 | eval_interval               | Iteration interval for evaluation.        | int    | None    | Optional  |
 | openmind_model_path         | Path of the openMind model to be trained.    | str    | None    | Optional  |
 | dtype                       | Dtype mode of the running model.         | str    | bf16    | Optional  |
 | plugin_args                 | [Accelerate plugin parameters](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | dict   | None    | Optional  |
 | dataloader_config           | [Dataloader configuration parameters](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | dict   | None    | Optional  |
 | report_to                   | Location to which Accelerate logs are reported.    | str    | None    | Optional  |
 | project_name                | Project name.                | str    | None    | Optional  |
 ## PreTrainer Description
 The PreTrainer API creates a Megatron pre-trainer or other pre-trainers based on whether Accelerate uses the Megatron-LM distributed acceleration library (specifically `ACCELERATE_USE_MEGATRON_LM=="true"`).
 ### Megatron Pre-trainer
 | No.| Constraint Description                                                                 |
 | ---- |-----------------------------------------------------------------------|
 | 1    | The Megatron dependencies need to be installed.                                                    |
 | 2    | The openmind_accelerate dependencies need to be installed.                                            |
 | 3    | Megatron manages accumulated gradients. Therefore, the `gradient_accumulation_steps` parameter of Accelerate must be set to **1**.|
 | 4    | `train_dataloader` needs to be provided during initialization or `data_path` needs to be provided in **PreTrainingArguments**.      |
 | 5    | `model` needs to be provided during initialization or `openmind_model_path` needs to be provided in **PreTrainingArguments**.       |
 ### Other Pre-trainers
 | No. | Constraint                                                           |
 | ---- |----------------------------------------------------------------|
 | 1    | `train_dataloader` needs to be provided during initialization.                                   |
 | 2    | `optimizer` needs to be provided during initialization.                                          |
 | 3    | `lr_scheduler` needs to be provided during initialization.                                       |
 | 4    | `model` needs to be provided during initialization or `openmind_model_path` needs to be provided in **PreTrainingArguments**.|
 *Thank community contributors for contributing the llama 2 model and alpaca_en dataset.*
--- a/docs/en/overview.md
+++ b/docs/en/overview.md
@ -4,8 +4,6 @@ openMind Library is an open-source deep learning development kit. It supports mo
 ## openMind Library Features
 + To cope with the challenges of distributed training of foundation models, openMind Library provides pre-training APIs and acceleration libraries such as MindSpeed and Accelerate to help you quickly and smoothly train foundation models. For details, see [model pre-training](basic_tutorial/pretrainer.md).
 + openMind Library encapsulates APIs such as Transformers, MindFormers AutoClass, Pipeline, and Trainer, enhances functions, and provides the capability of automatic download and load of models from the Modelers community. In addition, the Ascend NPU affinity feature is added, effectively improves the performance of model training and inference on Ascend NPUs. For details, see [Model Fine-Tuning](basic_tutorial/finetune/overview.md) and [Model Inference](basic_tutorial/pipeline.md).
 + openMind Library provides simple and easy-to-use command-line interfaces (CLIs) for quickly uploading, downloading, inferring, dialog, and deploying models with low code. For details, see the [command line interface](basic_tutorial/cli.md).
--- a/docs/menu/menu.json
+++ b/docs/menu/menu.json
@ -50,13 +50,6 @@
          "en": "Data Load"
        }
      },
      {
        "id": "pretrainer",
        "label": {
          "zh": "模型预训练",
          "en": "Model Pre-training"
        }
      },
      {
        "id": "train",
        "label": {
@ -343,13 +336,6 @@
              "en": "Pipelines"
            }
          },
          {
            "id": "pretrainer_api",
            "label": {
              "zh": "PreTrainer",
              "en": "PreTrainer"
            }
          },
          {
            "id": "trainer_api",
            "label": {
--- a/docs/zh/api_reference/apis/pretrainer_api.md
+++ b/docs/zh/api_reference/apis/pretrainer_api.md
@ -1,124 +0,0 @@
 # PreTrainer 模块接口
 ## openmind.PreTrainer类
 `PreTrainer`类提供了通用的预训练流程管理功能。
 **参数列表**
 | 参数名           | 类型                                        | 描述            | 默认值  |
 | ---------------- | ------------------------------------------- |---------------|------|
 | pretrain_args    | PreTrainingArguments                        | 预训练参数。        | -    |
 | accelerator      | Accelerator                                 | accelerate实例。 | None |
 | model            | torch.nn.Module                             | torch模型。      | None |
 | optimizer        | accelerate.utils.MegatronLMOptimizerWrapper | 优化器。          | None |
 | lr_scheduler     | accelerate.utils.MegatronLMSchedulerWrapper | 调度器。          | None |
 | train_dataloader | torch.utils.data.DataLoader                 | 训练数据加载器。      | None |
 | eval_dataloader  | torch.utils.data.DataLoader                 | 评估数据加载器。      | None |
 ### train
 预训练启动。
 **接口原型**
 ```python
 def train()
 ```
 ## openmind.PreTrainingArguments类
 `PreTrainingArguments`类用于配置训练任务的参数，包括训练过程中所需的超参数、模型保存路径和学习率等。
 **参数列表**
 | 参数名                      | 类型 | 描述                | PyTorch默认值            |
 | --------------------------- | ---- |-------------------|-----------------------|
 | num_training_steps          | int  | 训练步数。             | -                     |
 | micro_batch_size            | int  | 微批大小。             | -                     |
 | dp                          | int  | 并行度。              | -                     |
 | gradient_accumulation_steps | int  | 梯度累计步数。           | 1                     |
 | seq_length                  | int  | 最大处理序列长度。         | None                  |
 | megatron_dataset_flag       | bool | 是否未megatron格式数据集。 | None                  |
 | data_path                   | str  | 数据集路径。            | None                  |
 | save_dir                    | str  | 模型保存路径。           | None                  |
 | save_interval               | int  | 模型保存间隔。           | None                  |
 | eval_interval               | int  | 模型评估间隔。           | None                  |
 | openmind_model_path         | str  | 模型路径。             | None                  |
 | dtype                       | str  | 运行时数据类型。          | bf16                  |
 | plugin_args                 | dict | [Accelerate插件参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | None                  |
 | dataloader_config           | dict | [加载器配置参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | None                  |
 | report_to                   | str  | accelerate日志输出对象。 | None                  |
 | project_name                | str  | 项目名称。             | "accelerate-megatron" |
 ### from_yaml
 从yaml配置文件加载配置。
 **接口原型**
 ```python
 def from_yaml(config_path: str)
 ```
 **参数列表**
 | 参数名      | 描述          | 支持类型 |
 | ----------- |-------------| -------- |
 | config_path | yaml配置文件路径。 | str      |
 ### get_mixed_precision
 获取混合精度类型。
 **接口原型**
 ```python
 def get_mixed_precision()
 ```
 ### get_torch_dtype
 获取运行时数据类型。
 **接口原型**
 ```python
 def get_torch_dtype()
 ```
 ### get_distributed_train_args
 获取分布式预训练参数。
 **接口原型**
 ```python
 def get_distributed_train_args()
 ```
 ### update_distributed_train_args
 更新分布式预训练参数。
 **接口原型**
 ```python
 def update_distributed_train_args(extra_args: dict)
 ```
 **参数列表**
 | 参数名     | 描述          | 支持类型 |
 | ---------- |-------------| -------- |
 | extra_args | 分布式预训练额外参数。 | dict     |
 ### get_dataloader_config
 获取数据加载器配置参数。
 **接口原型**
 ```python
 def get_dataloader_config()
 ```
--- a/docs/zh/basic_tutorial/pretrainer.md
+++ b/docs/zh/basic_tutorial/pretrainer.md
@ -1,450 +0,0 @@
 # 模型预训练
 ## 基础概念
 **预训练**是一种深度学习模型训练的策略，通常在大规模的数据集上进行。预训练的目标是通过在一个相关但较大的任务上训练模型，使得模型学习到通用的特征表示。但是随着大模型参数和所需训练数据量的急剧增长，单个机器的资源上限已无法满足训练要求，于是就引出了分布式训练的概念。
 **分布式训练**指的是将深度学习模型任务分解为多个子任务，并在多个计算设备上并行的进行训练。分布式训练极大地提升了大模型的训练速度，可以大幅降低模型训练的总体时间。
 本文档中的PreTrainer是基于Accelerate实现了多框架（Megatron、DeepSpeed以及FSDP）的分布式能力，并提供了通用的预训练流程管理功能。
 ## 环境准备
 ```shell
 torch: 2.1.0
 transformers: 4.45.2
 accelerate: 0.28.0
 deepspeed: 0.15.2
 megatron_core: 0.4.0rc0
 ```
 ### 安装Megatron-LM分布式框架
 若用户需要使用Megatron-LM分布式框架，则还需执行以下步骤。
 1. 安装Megatron（[参考MindSpeed的Megatron安装方式](https://gitee.com/ascend/MindSpeed#3-获取-megatron-lm-并指定-commit-id)）
   ```shell
   git clone https://github.com/NVIDIA/Megatron-LM.git
   cd Megatron-LM
   git checkout bcce6f54e075e3c3374ea67adefe54f3f2da2b07
   pip install --no-use-pep517 -e .  # 使用"--no-use-pep517 -e"安装megatron全部文件
   ```
 2. 安装MindSpeed
   ```shell
   git clone  https://gitee.com/ascend/MindSpeed.git
   cd MindSpeed
   git checkout origin/1.0.RC1
   pip install -r requirements.txt
   pip install -e .
   ```
 3. 使用pip安装魔乐社区openmind_accelerate插件
   ```shell
   #aarch64平台
   pip install openmind-accelerate
   #x86平台
   pip install openmind-accelerate --extra-index-url https://download.pytorch.org/whl/cpu 
   ```
 4. 安装accelerate与deepspeed
   ```shell
   pip install deepspeed==0.15.2
   pip install accelerate==0.28.0
   ```
 ### openMind Library环境准备
 ```shell
 #aarch64环境下安装
 pip install openmind[pt] 
 #x86环境下安装
 pip install openmind[pt] --extra-index-url https://download.pytorch.org/whl/cpu 
 ```
 openMind Library依赖环境安装请参考[openMind Library安装指南](../install.md)。
 安装完成后请使用`pip list`检查版本依赖，如果在安装上述依赖的时候，accelerate或transformers版本被刷新，请重新刷回指定版本。
 ## 快速使用
 我们提供了[样例配置文件和启动脚本](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples)，方便用户一键使用。
 ### PreTrainer的使用步骤如下所示
 #### 准备数据
 用户需要准备好自己的预训练数据，例如[alpaca_en](https://modelers.cn/datasets/HaM/alpaca_en/tree/main)数据。
 如果用户需要使用Megatron-LM分布式框架，可参考[Megatron的数据处理方法](https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#data-preprocessing) 进行处理。
 #### 准备模型
 用户需要准备好模型文件，例如[llama2模型](https://modelers.cn/models/AI_Connect/llama2_7b/tree/main)。
 如果用户需要使用Megatron-LM分布式框架，则只需要准备config.json和tokenizer相关文件即可。
 #### 准备预训练参数
 预训练参数可以通过加载 [llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml) 文件自动生成，用户可参考[此处](#llama2_megatron)基于json格式微调数据集的样例配置文件：
 #### 启动
 - Accelerate配置文件可参考：[accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml)
   ```yaml
   compute_environment: LOCAL_MACHINE
   debug: false
   distributed_type: MEGATRON_LM
   downcast_bf16: 'no'
   machine_rank: 0
   main_training_function: main
   num_machines: 1
   num_processes: 8
   rdzv_backend: static
   same_network: true
   tpu_env: [ ]
   tpu_use_cluster: false
   tpu_use_sudo: false
   use_cpu: false
   ```
 - 模型配置文件可参考：[llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml)
    <a id="llama2_megatron"></a>
    ```yaml
    num_training_steps: 1000
    micro_batch_size: &micro_batch_size 4
    dp: 1
    gradient_accumulation_steps: &gradient_accumulation_steps 8
    ### seq_length需要小于或等于模型权重配置文件config.json中，"max_position_embeddings"字段的值
    seq_length: &seq_length 4096
    megatron_dataset_flag: False
    ### data_path请传入本地微调数据集所在路径
    data_path: &data_path '/path/to/alpaca_en/alpaca_data_en_52k.json'
    ### 微调模型权重保存路径
    save_dir: './saves'
    save_interval: 10000
    eval_interval: 10000
    ### openmind_model_path请传入本地模型权重文件夹所在路径
    openmind_model_path: '/path/to/llama2-7b-hf'
    dtype: 'bf16'
    plugin_args:
      tp_degree: 8
      pp_degree: 1
      num_micro_batches: *gradient_accumulation_steps
      gradient_clipping: 1.0
      use_distributed_optimizer: False
      sequence_parallelism: False
      other_megatron_args:
        ### tokenizer_model请传入本地模型权重文件中，tokenizer.model文件所在路径
        tokenizer_model: &tokenizer_model '/path/to/llama2-7b-hf/tokenizer.model'
        tokenizer_type: &tokenizer_type 'Llama2Tokenizer'
        finetune: False
        recompute_granularity: "full"
        recompute_method: "block"
        recompute_num_layers: 32
        optimizer: "adam"
        lr: 1e-5
        min_lr: 1e-6
        adam_beta2: 0.95
        add_bias_linear: False
        async_tensor_model_parallel_allreduce: False
        attention_dropout: 0.0
        attention_softmax_in_fp32: False
        bias_gelu_fusion: False
        ffn_hidden_size: 11008
        hidden_dropout: 0.0
        init_method_std: 0.01
        initial_loss_scale: 65536.0
        lr_decay_style: "cosine"
        lr_warmup_fraction: 0.01
        masked_softmax_fusion: False
        normalization: "RMSNorm"
        split: &split "100,0,0"
        swiglu: True
        untie_embeddings_and_output_weights: True
        use_flash_attn: False
        weight_decay: 0.1
        no_load_optim: True
        no_load_rng: True
        eval_iters: &eval_iters 10
        position_embedding_type: "rope"
    dataloader_config:
      return_tensors: 'pt'
      padding: 'max_length'
      pad_to_multiple_of: *seq_length
      max_length: *seq_length
    ```
 - 预训练程序文件可参考[train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py)，此python脚本不能直接运行，如需运行，请自行下载如下仓库获取utils相关代码，然后将accelerate_examples/examples/utils复制到此脚本同目录下。
    ```shell
    git clone https://modelers.cn/AI-Research/accelerate_examples.git
    cp -r accelerate_examples/examples/utils ./   # 自行替换目的路径为train_with_megatron_json_dataset.py所在路径
    ```
    ```python
    import os
    import openmind_accelerate
    from openmind import PreTrainingArguments, PreTrainer
    from utils.config import get_pretrain_config_file
    from utils.accelerator import make_accelerator
    from utils.data import make_train_and_eval_dataloader
    from utils.tokenizer import get_tokenizer
    pretrain_args = PreTrainingArguments.from_yaml(get_pretrain_config_file())
    os.makedirs(pretrain_args.save_dir, exist_ok=True)
    accelerator = make_accelerator(pretrain_args=pretrain_args)
    tokenizer = get_tokenizer(tokenizer_path=pretrain_args.openmind_model_path, use_fast=False)
    transformer_dataloader_config = pretrain_args.get_dataloader_config()
    train_dataloader, eval_dataloader = make_train_and_eval_dataloader(
        dataloader_config=transformer_dataloader_config,
        micro_batch_size=pretrain_args.micro_batch_size,
        data_files=pretrain_args.data_path,
        max_length=pretrain_args.seq_length,
        tokenizer=tokenizer,
        accelerator=accelerator
    )
    pretrainer = PreTrainer(pretrain_args=pretrain_args,
                            train_dataloader=train_dataloader,
                            eval_dataloader=eval_dataloader,
                            )
    pretrainer.train()
    ```
 在完成上述环境配置以及配置文件准备后，即可通过如下命令启动微调，请确保其中的训练脚本和配置文件为本地实际路径。
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
 ```
 ## 进阶使用
 ### 定义预训练参数
 在我们定义PreTrainer之前首先需要定义一个PreTrainingArguments类，它将包含PreTrainer用于训练和评估的所有超参数。用户可以通过配置文件或者直接传参初始化预训练参数。
 #### 使用配置文件
 预训练参数可以通过加载yaml文件自动生成，更多yaml样例可参考：[样例链接](https://modelers.cn/models/AI-Research/accelerate_examples/tree/main/examples/llama2_config)。
 ```python
 from openmind import PreTrainingArguments
 # 路径需要替换为本地路径
 pretrain_args = PreTrainingArguments.from_yaml(
    "openmind-accelerate/examples/llama2_config/llama2-megatron.yaml"
 )
 ```
 #### 直接传参
 预训练参数也可以通过传参的方式实例化。使用Megatron模型训练Megatron数据集的预训练器初始化流程如下。
 参数链接请点击：[PreTrainingArguments说明](#pretrainingarguments说明)。
 ```python
 from openmind import PreTrainingArguments
 # 路径需要替换为本地路径
 pretrain_args = PreTrainingArguments(
    megatron_dataset_flag=True,
    data_path="HaM/alpaca_en",
    num_training_steps=1000,
    micro_batch_size=4,
    dp=1,
    gradient_accumulation_steps=8,
    seq_length=2048,
 )
 ```
 ### 使用Megatron框架预训练模型
 用户完成预训练参数配置后即可启动Megatron模型预训练。
 - Accelerate对接Megatron的配置文件可参考：[accelerate_config/accelerate_megatron_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_megatron_config.yaml)
 - 使用Megatron框架训练Json数据运行示例可参考：[train_with_megatron_json_dataset.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_megatron_json_dataset.py)。
 - Json格式数据预训练配置文件示例可参考：[llama2_config/llama2-megatron-json-dataset.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-megatron-json-dataset.yaml)。
 用户只需要将准备好的`train_dataloader`（`eval_dataloader`非必选），传给PreTrainer，即可使用用户自定义的dataloader预训练模型。
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_megatron_config.yaml train_with_megatron_json_dataset.py --pretrain_config_file llama2_config/llama2-megatron-json-dataset.yaml
 ```
 #### 自定义Megatron框架处理流程（可选）
 ##### 自定义处理函数
 如下代码所示，PreTrainer接口在使用Megatron预训练时，支持用户根据实际场景按需自定义`datasets_provider`、`model_provider`、`get_batch`和`loss_function`中的任意函数，并将函数指针赋值到如下属性中。自定义函数的实现可参考官方样例[pretrain_gpt.py](https://github.com/NVIDIA/Megatron-LM/blob/main/pretrain_gpt.py)。
 - `custom_megatron_datasets_provider_function`：用于提供Megatron的训练和验证数据集。
 - `custom_get_batch_function`：用于生成批次数据。
 - `custom_model_provider_function`：用于构建模型。
 - `custom_loss_function`：返回损失函数。
 ```python
 import openmind_accelerate
 from openmind import PreTrainingArguments
 from pretrain_gpt import (
    train_valid_test_datasets_provider,
    get_batch as megatron_gpt_get_batch,
    model_provider as megatron_gpt_model_provider,
    loss_func as megatron_gpt_loss_func,
 )
 # 路径需要替换为本地路径
 pretrain_args = PreTrainingArguments.from_yaml(
    "openmind-accelerate/examples/llama2_config/llama2-megatron-json-dataset.yaml"
 )
 train_valid_test_datasets_provider.is_distributed = True
 pretrain_args.update_distributed_train_args(
    extra_args={
        "custom_megatron_datasets_provider_function": train_valid_test_datasets_provider,
        "custom_get_batch_function": megatron_gpt_get_batch,
        "custom_model_provider_function": megatron_gpt_model_provider,
        "custom_loss_function": megatron_gpt_loss_func,
    }
 )
 ```
 ##### 自定义解析模型配置文件
 用户可依据Accelerate解析模型配置的格式，自定义模型配置文件解析函数。以下为PreTrainer内置的llama模型配置文件解析函数，用户可以根据实际情况参考。
 ```python
 import openmind_accelerate
 from accelerate.utils import add_model_config_to_megatron_parser
@add_model_config_to_megatron_parser("llama")
 def parse_llama_config(megatron_lm_plugin, model, batch_data):
    model_type_name = "gpt"
    num_layers = model.config.num_hidden_layers
    pretraining_flag = True
    hidden_size = model.config.hidden_size
    num_attention_heads = model.config.num_attention_heads
    orig_vocab_size = model.config.vocab_size
    max_position_embeddings = getattr(model.config, "max_position_embeddings")
    seq_length = getattr(model.config, "max_sequence_length", None)
    if megatron_lm_plugin.seq_length is None:
        if seq_length is not None:
            megatron_lm_plugin.seq_length = seq_length
        elif megatron_lm_plugin.decoder_seq_length is not None:
            megatron_lm_plugin.seq_length = megatron_lm_plugin.decoder_seq_length
        elif batch_data is not None:
            megatron_lm_plugin.seq_length = batch_data["input_ids"].shape[1]
        else:
            megatron_lm_plugin.seq_length = max_position_embeddings
    megatron_lm_plugin.megatron_lm_default_args["return_logits"] = megatron_lm_plugin.return_logits
    megatron_lm_plugin.megatron_lm_default_args["tokenizer_type"] = "Llama2Tokenizer"
    megatron_lm_plugin.megatron_lm_default_args["model_type_name"] = model_type_name
    megatron_lm_plugin.megatron_lm_default_args["num_layers"] = num_layers
    megatron_lm_plugin.megatron_lm_default_args["pretraining_flag"] = pretraining_flag
    megatron_lm_plugin.megatron_lm_default_args["hidden_size"] = hidden_size
    megatron_lm_plugin.megatron_lm_default_args["num_attention_heads"] = num_attention_heads
    megatron_lm_plugin.megatron_lm_default_args["orig_vocab_size"] = orig_vocab_size
    megatron_lm_plugin.megatron_lm_default_args["max_position_embeddings"] = max_position_embeddings
    megatron_lm_plugin.megatron_lm_default_args["seq_length"] = megatron_lm_plugin.seq_length
    megatron_lm_plugin.megatron_lm_default_args["model_return_dict"] = model.config.return_dict
 ```
 ### 使用其他框架预训练模型
 PreTrainer是基于Accelerate实现的多框架分布式能力，所以PreTrainer除了支持Megatron框架，还支持DeepSpeed和FSDP分布式框架。如下以DeepSpeed分布式框架为例：
 用户完成Json格式预训练参数配置后即可启动DeepSpeed模型预训练。
 - Accelerate对接DeepSpeed的配置文件示例可参考：[accelerate_config/accelerate_deepspeed_config.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/accelerate_config/accelerate_deepspeed_config.yaml)。
 - 使用DeepSpeed框架训练Json数据运行示例可参考：[train_with_deepspeed.py](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/train_with_deepspeed.py)。
 - Json格式数据预训练配置文件示例可参考：[llama2_config/llama2-deepspeed.yaml](https://modelers.cn/models/AI-Research/accelerate_examples/blob/main/examples/llama2_config/llama2-deepspeed.yaml)。
 ```yaml
 num_training_steps: 1000
 micro_batch_size: 1
 dp: 8
 gradient_accumulation_steps: 8
 seq_length: 4096
 megatron_dataset_flag: False
 data_path: '/path/to/alpaca_en/alpaca_data_en_52k.json'
 save_dir: './saves'
 save_interval: 10000
 eval_interval: 10000
 openmind_model_path: '/path/to/llama2-7b-hf'
 dtype: 'bf16'
 dataloader_config:
  return_tensors: 'pt'
  padding: 'max_length'
  pad_to_multiple_of: 4096
  max_length: 4096
 ### seq_length、max_length以及padding的值均需要小于或等于模型权重配置文件config.json中，"max_position_embeddings"字段的值
 ```
 ```shell
 accelerate launch --config_file accelerate_config/accelerate_deepspeed_config.yaml train_with_deepspeed.py --pretrain_config_file llama2_config/llama2-deepspeed.yaml
 ```
 ## PreTrainingArguments说明
 | **参数名**                     | **描述**                | **类型** | **默认值** | 是否可选 |
 |-----------------------------|-----------------------|--------|---------|---------|
 | num_training_steps          | 训练模型的总步数。             | int    | -       | 必选 |
 | micro_batch_size            | 每个模型实例的批处理大小。         | int    | -       | 必选     |
 | dp                          | 数据并行度。                  | int    | -       | 必选     |
 | gradient_accumulation_steps | 在更新模型参数之前要累积的梯度步数。    | int    | 1       | 可选     |
 | seq_length                  | 要处理的最大序列长度。           | int    | None    | 可选  |
 | megatron_dataset_flag       | 是否使用Megatron类型数据集的标志。 | bool   | None    | 可选  |
 | data_path                   | 训练数据集的路径。             | str    | None    | 可选  |
 | save_dir                    | 要将检查点保存到的输出目录。        | str    | None    | 可选  |
 | save_interval               | 检查点保存的迭代间隔。         | int    | None    | 可选  |
 | eval_interval               | 验证集评估的迭代间隔。        | int    | None    | 可选  |
 | openmind_model_path         | 待训练的openMind模型的路径。    | str    | None    | 可选  |
 | dtype                       | 运行模型的dtype模式。         | str    | bf16    | 可选  |
 | plugin_args                 | [Accelerate插件参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMPlugin) | dict   | None    | 可选  |
 | dataloader_config           | [加载器配置参数。](https://huggingface.co/docs/accelerate/v0.28.0/en/package_reference/megatron_lm#accelerate.utils.MegatronLMDummyDataLoader) | dict   | None    | 可选  |
 | report_to                   | Accelerate日志上报到何处。    | str    | None    | 可选  |
 | project_name                | 项目的名称。                | str    | None    | 可选  |
 ## PreTrainer说明
 PreTrainer接口会根据Accelerate是否使用Megatron-LM分布式加速库（以环境变量`ACCELERATE_USE_MEGATRON_LM=="true"`为依据），来选择创建Megatron预训练器或其他预训练器。
 ### Megatron预训练器
 | 序号 | 约束描述                                                                  |
 | ---- |-----------------------------------------------------------------------|
 | 1    | 需要预先安装Megatron依赖。                                                     |
 | 2    | 需要预先安装openmind_accelerate插件依赖。                                             |
 | 3    | Megatron会自管理累积梯度，所以Accelerate的`gradient_accumulation_steps`参数需要指定为 1。 |
 | 4    | 初始化时需要提供`train_dataloader`或在PreTrainingArguments里提供`data_path`。       |
 | 5    | 初始化时需要提供`model`或在PreTrainingArguments里提供`openmind_model_path`。        |
 ### 其他预训练器
 | 序号 | 约束描述                                                           |
 | ---- |----------------------------------------------------------------|
 | 1    | 初始化时需要提供`train_dataloader`。                                    |
 | 2    | 初始化时需要提供`optimizer`。                                           |
 | 3    | 初始化时需要提供`lr_scheduler`。                                        |
 | 4    | 初始化时需要提供`model`或在PreTrainingArguments里提供`openmind_model_path`。 |
 *感谢社区贡献的 llama2 模型以及 alpaca_en 数据集*
--- a/docs/zh/basic_tutorial/train/train_params.md
+++ b/docs/zh/basic_tutorial/train/train_params.md
@ -79,6 +79,75 @@ You are a helpful assistant.<|im_end|>
    </tr>
  </thead>
  <tbody>
    <!-- Qwen3 -->
    <tr>
      <td rowspan="11">Qwen3</td>
      <td>Qwen3-32B-Chat</td>
      <td>Models_Ecosystem/Qwen3-32B</td>
      <td>Qwen/Qwen3-32B</td>
      <td rowspan="11">qwen</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-14B-Chat</td>
      <td>Models_Ecosystem/Qwen3-14B</td>
      <td>Qwen/Qwen3-14B</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-14B</td>
      <td>Models_Ecosystem/Qwen3-14B-Base</td>
      <td>Qwen/Qwen3-14B-Base</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-8B-Chat</td>
      <td>Models_Ecosystem/Qwen3-8B</td>
      <td>Qwen/Qwen3-8B</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-8B</td>
      <td>Models_Ecosystem/Qwen3-8B-Base</td>
      <td>Qwen/Qwen3-8B-Base</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-4B-Chat</td>
      <td>Models_Ecosystem/Qwen3-4B</td>
      <td>Qwen/Qwen3-4B</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-4B</td>
      <td>Models_Ecosystem/Qwen3-4B-Base</td>
      <td>Qwen/Qwen3-4B-Base</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-1.7B-Chat</td>
      <td>Models_Ecosystem/Qwen3-1.7B</td>
      <td>Qwen/Qwen3-1.7B</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-1.7B</td>
      <td>Models_Ecosystem/Qwen3-1.7B-Base</td>
      <td>Qwen/Qwen3-1.7B-Base</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-0.6B-Chat</td>
      <td>Models_Ecosystem/Qwen3-0.6B</td>
      <td>Qwen/Qwen3-0.6B</td>
      <td></td>
    </tr>
    <tr>
      <td>Qwen3-0.6B</td>
      <td>Models_Ecosystem/Qwen3-0.6B-Base</td>
      <td>Qwen/Qwen3-0.6B-Base</td>
      <td></td>
    </tr>
    <!-- Qwen2.5 -->
    <tr>
      <td rowspan="3">Qwen2.5</td>
@ -100,6 +169,15 @@ You are a helpful assistant.<|im_end|>
      <td>Qwen/Qwen2.5-32B</td>
      <td></td>
    </tr>
    <!-- Qwen2.5-VL -->
    <tr>
      <td>Qwen2.5-VL</td>
      <td>Qwen2.5-VL-7B-Instruct</td>
      <td>PyTorch-NPU/Qwen2.5-VL-7B-Instruct</td>
      <td>Qwen/Qwen2.5-VL-7B-Instruct</td>
      <td>qwen2_vl</td>
      <td></td>
    </tr>
    <!-- Qwen2 -->
    <tr>
      <td rowspan="3">Qwen2</td>
@ -256,15 +334,6 @@ You are a helpful assistant.<|im_end|>
      <td>deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B</td>
      <td></td>
    </tr>
    <!-- Qwen2.5-VL -->
    <tr>
      <td>Qwen2.5-VL</td>
      <td>Qwen2.5-VL-7B-Instruct</td>
      <td>PyTorch-NPU/Qwen2.5-VL-7B-Instruct</td>
      <td>Qwen/Qwen2.5-VL-7B-Instruct</td>
      <td>qwen2_vl</td>
      <td></td>
    </tr>
  </tbody>
 </table>
--- a/docs/zh/overview.md
+++ b/docs/zh/overview.md
@ -4,8 +4,6 @@ openMind Library是一个深度学习开发套件，通过简单易用的API支
 ## openMind Library特性
 + 为了应对大模型分布式训练的挑战，openMind Library提供了预训练接口，支持MindSpeed、Accelerate等加速库，帮助开发者顺畅快速地训练大模型，具体可参考[模型预训练](basic_tutorial/pretrainer.md)章节。
 + openMind Library基于[transformers库](https://github.com/huggingface/transformers)，集成了PyTorch框架下主流第三方工具的功能，提供了一键式的封装的微调命令行接口解决方案，涵盖了从数据处理、权重加载，到低参数训练、量化适配，训练和跟踪的全流程功能，更多细节可查看[模型训练](basic_tutorial/train/overview.md)。
 + openMind Library对Transformers和MindFormers的AutoClass、Pipeline、Trainer等接口进行封装，并增强了其功能，提供了对应的SDK。还提供了从魔乐社区自动下载和加载模型的能力，同时扩展新增了昇腾NPU亲和的特性，有效提升在昇腾NPU上进行模型训练推理的性能，具体可参考[模型训练](basic_tutorial/train/overview.md)和[模型推理](basic_tutorial/pipeline.md)章节。 
--- a/src/openmind/init.py
+++ b/src/openmind/init.py
@ -51,8 +51,6 @@ if TYPE_CHECKING:
    from .archived.trainers import (
        Trainer,
        TrainingArguments,
        PreTrainer,
        PreTrainingArguments,
    )
    from .archived.pipelines import pipeline
    from .omdatasets import OmDataset
--- a/src/openmind/archived/trainers/init.py
+++ b/src/openmind/archived/trainers/init.py
@ -18,16 +18,12 @@ from openmind.utils import _LazyModule
 if TYPE_CHECKING:
    from .trainer import Trainer
    from .training_args import TrainingArguments
    from .pretrainer import PreTrainer
    from .pretraining_args import PreTrainingArguments
 else:
    import sys
    _import_structure = {
        "trainer": ["Trainer"],
        "training_args": ["TrainingArguments"],
        "pretrainer": ["PreTrainer"],
        "pretraining_args": ["PreTrainingArguments"],
    }
    sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
--- a/src/openmind/archived/trainers/pretrainer.py
+++ b/src/openmind/archived/trainers/pretrainer.py
@ -1,452 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 import dataclasses
 import importlib
 import importlib.util
 import os
 import time
 import warnings
 from accelerate import Accelerator, init_empty_weights
 try:
    import torch
 except ImportError as e:
    raise ImportError("Please install torch package before using this PreTrainer.") from e
 import torch.utils.data
 from transformers import AutoConfig, AutoModelForCausalLM
 from .pretrainer_utils import print_in_last_rank, print_in_main_process
 from .pretraining_args import PreTrainingArguments
 warnings.warn(
    "The class 'PreTrainer' is deprecated and will be removed in version 1.1.0. ",
    FutureWarning,
 )
 class _PreTrainerCommon:
    def __init__(
        self,
        pretrain_args: PreTrainingArguments,
        accelerator: Accelerator = None,
        model: torch.nn.Module = None,
        optimizer=None,
        lr_scheduler=None,
        train_dataloader: torch.utils.data.DataLoader = None,
        eval_dataloader: torch.utils.data.DataLoader = None,
        *args,
        **kwargs,
    ):
        self.model = model
        self.pretrain_args = pretrain_args
        self.train_dataloader = train_dataloader
        self.optimizer = optimizer
        self.lr_scheduler = lr_scheduler
        self.accelerator = accelerator
        self.eval_dataloader = eval_dataloader
        self.completed_steps = 0
        self._post_init()
    def train(self):
        self._pre_training()
        batch_loss_sum = 0
        start_time = time.time()
        while self.completed_steps < self.pretrain_args.num_training_steps:
            for batch in self.train_dataloader:
                outputs = self._train_step(batch)
                loss_ = outputs.loss.detach().float()
                batch_loss_sum += loss_.item()
                if self.accelerator.sync_gradients:
                    self.completed_steps += 1
                else:
                    continue  # for accelerator's gradient_accumulation
                lr = self._get_lr()
                batch_loss_avg = self._get_batch_loss_avg(batch_loss_sum=batch_loss_sum)
                elapsed_time = (time.time() - start_time) * 1000  # ms
                self._train_step_log(step=self.completed_steps, loss=batch_loss_avg, lr=lr, elapsed_time=elapsed_time)
                batch_loss_sum = 0
                if (
                    self.pretrain_args.save_interval
                    and self.completed_steps % self.pretrain_args.save_interval == 0
                    and self.pretrain_args.save_dir
                ):
                    self._save_state(save_dir=self.pretrain_args.save_dir)
                if (
                    self.pretrain_args.eval_interval
                    and self.completed_steps % self.pretrain_args.eval_interval == 0
                    and self.eval_dataloader is not None
                ):
                    self._eval(eval_dataloader=self.eval_dataloader, completed_steps=self.completed_steps)
                start_time = time.time()
                if self.completed_steps >= self.pretrain_args.num_training_steps:
                    break
        self.accelerator.end_training()
        self.accelerator.wait_for_everyone()
        self._post_training()
    def _post_init(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _init_trackers(self):
        experiment_config = {}
        experiment_config.update(dataclasses.asdict(self.pretrain_args))
        self.accelerator.init_trackers(self.pretrain_args.project_name, experiment_config)
    def _get_gradient_accumulation_steps(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _get_batch_loss_avg(self, batch_loss_sum):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _get_lr(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _train_step(self, batch):
        self.model.train()
        with self.accelerator.accumulate(self.model):
            outputs = self.model(**batch)
            loss = outputs.loss
            self.accelerator.backward(loss)
            self.optimizer.step()
            self.lr_scheduler.step()
            self.optimizer.zero_grad()
        return outputs
    def _train_step_log(self, loss, lr, elapsed_time, step):
        log_str = (
            f"step: {step} | elapsed time per iteration (ms): {elapsed_time:.1f} | learning rate: {lr:.3E} | "
            f"lm loss: {loss:.6E}"
        )
        print_in_last_rank(log_str)
        # tracker
        self.accelerator.log(
            {
                "train_loss": loss,
                "learning_rate": lr,
            },
            step=step,
        )
    def _print_training_info(self):
        print_in_main_process("***** Running training *****")
        print_in_main_process(
            f"  Num examples = {self.pretrain_args.num_training_steps * self.pretrain_args.batch_size}"
        )
        print_in_main_process(f"  Instantaneous batch size per device = {self.pretrain_args.micro_batch_size}")
        print_in_main_process(
            f"  Total train batch size (w. parallel, distributed & accumulation) = {self.pretrain_args.batch_size}"
        )
        print_in_main_process(f"  Gradient Accumulation steps = {self._get_gradient_accumulation_steps()}")
        print_in_main_process(f"  Total steps = {self.pretrain_args.num_training_steps}")
    def _pre_training(self):
        self._print_training_info()
        print_in_main_process(f"[before the start of training step] datetime: {time.strftime('%Y-%m-%d %H:%M:%S')}")
        self.completed_steps = 0
    def _post_training(self):
        print_in_main_process(f"[after training is done] datetime: {time.strftime('%Y-%m-%d %H:%M:%S')}")
        self._save(save_dir=self.pretrain_args.save_dir)
    def _get_eval_loss(self, loss):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _eval(self, eval_dataloader, completed_steps=None):
        if completed_steps is not None:
            self.completed_steps = completed_steps
        losses = []
        for _, batch in enumerate(eval_dataloader):
            outputs = self._eval_step(batch)
            loss = outputs.loss
            losses.append(self._get_eval_loss(loss))
        self._eval_log(losses=losses)
    def _eval_step(self, batch):
        self.model.eval()
        with torch.no_grad():
            outputs = self.model(**batch)
        return outputs
    def _handle_eval_losses(self, losses):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _eval_log(self, losses):
        losses = self._handle_eval_losses(losses)
        eval_loss = torch.mean(losses)
        print_in_last_rank(f"validation at step: {self.completed_steps} | eval_loss: {eval_loss}")
        self.accelerator.log(
            {
                "eval_loss": eval_loss,
            },
            step=self.completed_steps,
        )
    def _save_state(self, save_dir):
        self.accelerator.save_state(save_dir)
    def _save(self, save_dir):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _read_model(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _prepare(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
    def _make_accelerator(self):
        raise NotImplementedError("_PreTrainerCommon : Not implemented!")
 class _PreTrainerMegatron(_PreTrainerCommon):
    def _make_megatron_dataloader(self):
        from accelerate.utils import MegatronLMDummyDataLoader
        data_path = self.pretrain_args.data_path
        megatron_dataloader_config = {
            "data_path": data_path if isinstance(data_path, list) else [data_path],
            "seq_length": self.pretrain_args.seq_length,
            "micro_batch_size": self.pretrain_args.micro_batch_size,
            "eval_interval": self.pretrain_args.eval_interval,
        }
        if self.pretrain_args.dataloader_config:
            for key, value in self.pretrain_args.dataloader_config.items():
                if key in megatron_dataloader_config.keys():
                    print_in_main_process(
                        f"PreTrainerMegatron dataloader overriding arguments for "
                        f"{key}:{megatron_dataloader_config[key]} with {key}:{value}"
                    )
                megatron_dataloader_config[key] = value
        megatron_dataloader = MegatronLMDummyDataLoader(**megatron_dataloader_config)
        self.train_dataloader = megatron_dataloader
        self.accelerator.state.megatron_lm_plugin.megatron_dataset_flag = True
    def _get_megatron_lm_plugin(self):
        from accelerate.utils import MegatronLMPlugin
        plugin_args = {
            "train_iters": self.pretrain_args.num_training_steps,
            "seq_length": self.pretrain_args.seq_length,
            "num_micro_batches": self.pretrain_args.gradient_accumulation_steps,
            "megatron_dataset_flag": self.pretrain_args.megatron_dataset_flag,
            "eval_interval": self.pretrain_args.eval_interval,
        }
        if self.pretrain_args.plugin_args:
            for key, value in self.pretrain_args.plugin_args.items():
                if key in plugin_args.keys():
                    msg = (
                        f"WARNING: PreTrainerMegatron plugin overriding arguments for "
                        f"{key}:{plugin_args[key]} with {key}:{value}"
                    )
                    print_in_main_process(msg)
                plugin_args[key] = value
        return MegatronLMPlugin(**plugin_args)
    def _make_accelerator(self):
        accelerate_kwargs = {
            "log_with": self.pretrain_args.report_to,
            "project_dir": self.pretrain_args.save_dir,
            "mixed_precision": self.pretrain_args.get_mixed_precision(),
        }
        megatron_lm_plugin = self._get_megatron_lm_plugin()
        accelerate_kwargs["megatron_lm_plugin"] = megatron_lm_plugin
        self.accelerator = Accelerator(**accelerate_kwargs)
    def _post_init(self):
        if importlib.util.find_spec("megatron") is None or importlib.util.find_spec("megatron.data") is None:
            raise EnvironmentError("You must use '--no-use-pep517' to pip install nvidia's megatron from source.")
        if importlib.util.find_spec("openmind_accelerate") is None:
            raise EnvironmentError("You must pip install openmind_accelerate.")
        import openmind_accelerate  # noqa:F401
        if self.accelerator is None:
            self._make_accelerator()
        if self.accelerator.gradient_accumulation_steps != 1:
            raise ValueError(
                "When using Megatron, gradient accumulation is done in Megatron, "
                "so gradient_accumulation_steps in Accelerator needs to be set to 1."
            )
        if self.train_dataloader is None:
            if not self.pretrain_args.data_path:
                raise ValueError("`PreTrainer` requires either a `train_dataloader` or `args.data_path` argument")
            self._make_megatron_dataloader()
        self.accelerator.state.megatron_lm_plugin.megatron_lm_default_args["train_iters"] = (
            self.pretrain_args.num_training_steps
        )
        if self.model is None:
            if not self.pretrain_args.openmind_model_path:
                raise ValueError("`PreTrainer` requires either a `model` or `args.openmind_model_path` argument")
            self._read_model()
        self._prepare()
        self._init_trackers()
    def _pre_training(self):
        from megatron import get_args
        super()._pre_training()
        args = get_args()
        self.model.iteration = args.iteration
        self.completed_steps = args.iteration
    def _eval(self, eval_dataloader, completed_steps=None):
        from megatron import get_args
        if completed_steps is not None:
            self.completed_steps = completed_steps
        args = get_args()
        losses = []
        iteration = 0
        for _, batch in enumerate(eval_dataloader):
            outputs = self._eval_step(batch)
            loss = outputs.loss
            losses.append(self._get_eval_loss(loss))
            iteration += 1
            if iteration >= args.eval_iters:
                break
        self._eval_log(losses=losses)
    def _get_gradient_accumulation_steps(self):
        return self.accelerator.state.megatron_lm_plugin.num_micro_batches
    def _get_batch_loss_avg(self, batch_loss_sum):
        return batch_loss_sum
    def _get_lr(self):
        return self.lr_scheduler.get_lr()
    def _get_eval_loss(self, loss):
        return loss
    def _handle_eval_losses(self, losses):
        return torch.tensor(losses)
    def _save(self, save_dir):
        self.accelerator.save_state(save_dir)
    def _read_model(self):
        model_config = AutoConfig.from_pretrained(self.pretrain_args.openmind_model_path)
        with init_empty_weights():
            self.model = AutoModelForCausalLM.from_config(model_config)
        self.model.config.use_cache = False
    def _prepare(self):
        from accelerate.utils import MegatronLMOptimizerWrapper, MegatronLMSchedulerWrapper
        self.model, self.train_dataloader, self.eval_dataloader = self.accelerator.prepare(
            self.model, self.train_dataloader, self.train_dataloader
        )
        self.optimizer = MegatronLMOptimizerWrapper(self.model.optimizer)
        self.lr_scheduler = MegatronLMSchedulerWrapper(self.model.scheduler, self.model.optimizer)
 class _PreTrainerOther(_PreTrainerCommon):
    def _make_accelerator(self):
        accelerate_kwargs = {
            "log_with": self.pretrain_args.report_to,
            "project_dir": self.pretrain_args.save_dir,
            "mixed_precision": self.pretrain_args.get_mixed_precision(),
        }
        self.accelerator = Accelerator(**accelerate_kwargs)
    def _post_init(self):
        if self.accelerator is None:
            self._make_accelerator()
        if self.train_dataloader is None:
            raise ValueError("When not using Megatron, `PreTrainer` requires `train_dataloader`")
        if self.optimizer is None:
            raise ValueError("When not using Megatron, `PreTrainer` requires `optimizer`")
        if self.lr_scheduler is None:
            raise ValueError("When not using Megatron, `PreTrainer` requires `lr_scheduler`")
        if self.model is None:
            if not self.pretrain_args.openmind_model_path:
                raise ValueError("`PreTrainer` requires either a `model` or `args.openmind_model_path` argument")
            self._read_model()
        self._prepare()
        self._init_trackers()
    def _get_gradient_accumulation_steps(self):
        return self.accelerator.gradient_accumulation_steps
    def _get_batch_loss_avg(self, batch_loss_sum):
        return batch_loss_sum / self._get_gradient_accumulation_steps()
    def _get_lr(self):
        return self.lr_scheduler.get_last_lr()[0]
    def _get_eval_loss(self, loss):
        return self.accelerator.gather_for_metrics(loss.repeat(self.pretrain_args.batch_size))
    def _handle_eval_losses(self, losses):
        return torch.cat(losses)
    def _save(self, save_dir):
        unwrapped_model = self.accelerator.unwrap_model(self.model)
        unwrapped_model.save_pretrained(
            save_dir, is_main_process=self.accelerator.is_main_process, save_function=self.accelerator.save
        )
    def _read_model(self):
        self.model = AutoModelForCausalLM.from_pretrained(
            self.pretrain_args.openmind_model_path,
            torch_dtype=self.pretrain_args.get_torch_dtype(),
        )
        self.model.gradient_checkpointing_enable()
        self.model.config.use_cache = False
    def _prepare(self):
        if self.eval_dataloader:
            (
                self.model,
                self.train_dataloader,
                self.eval_dataloader,
                self.optimizer,
                self.lr_scheduler,
            ) = self.accelerator.prepare(
                self.model, self.train_dataloader, self.eval_dataloader, self.optimizer, self.lr_scheduler
            )
        else:
            self.model, self.train_dataloader, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
                self.model, self.train_dataloader, self.optimizer, self.lr_scheduler
            )
 class PreTrainer(_PreTrainerCommon):
    def __new__(cls, *args, **kwargs):
        if os.environ.get("ACCELERATE_USE_MEGATRON_LM", "false") == "true":
            return _PreTrainerMegatron(*args, **kwargs)
        return _PreTrainerOther(*args, **kwargs)
--- a/src/openmind/archived/trainers/pretrainer_utils.py
+++ b/src/openmind/archived/trainers/pretrainer_utils.py
@ -1,39 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 import logging
 import os
 import torch
 from openmind.utils.logging import get_logger
 openmind_logger = get_logger(__name__)
 openmind_logger.setLevel(logging.INFO)
 def print_in_main_process(msg):
    local_rank = int(os.environ.get("LOCAL_RANK", -1))
    if local_rank in [0, -1]:
        openmind_logger.info(msg)
 def is_last_rank():
    return torch.distributed.get_rank() == (torch.distributed.get_world_size() - 1)
 def print_in_last_rank(msg):
    if torch.distributed.is_initialized():
        if is_last_rank():
            openmind_logger.info(msg)
    else:
        openmind_logger.info(msg)
--- a/src/openmind/archived/trainers/pretraining_args.py
+++ b/src/openmind/archived/trainers/pretraining_args.py
@ -1,115 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 import dataclasses
 from dataclasses import dataclass, field
 import re
 import warnings
 import torch
 import yaml
 from .pretrainer_utils import print_in_main_process
 warnings.warn(
    "The class 'PreTrainingArguments' is deprecated and will be removed in version 1.1.0. ",
    FutureWarning,
 )
 _dtype_map = {"bf16": torch.bfloat16, "fp16": torch.float16, "fp32": torch.float32}
@dataclass
 class PreTrainingArguments:
    num_training_steps: int = field(metadata={"help": "Total number fo steps to train the model."})
    micro_batch_size: int = field(metadata={"help": "Batch size per model instance."})
    dp: int = field(metadata={"help": "Degree of Parallelism."})
    gradient_accumulation_steps: int = field(
        default=1, metadata={"help": "The number of gradient steps to accumulate before updating the model parameters."}
    )
    seq_length: int = field(default=None, metadata={"help": "Maximum sequence length to process."})
    megatron_dataset_flag: bool = field(
        default=None, metadata={"help": "Flags for whether or not to use a Megatron type dataset."}
    )
    data_path: str = field(default=None, metadata={"help": "Path to the training dataset."})
    save_dir: str = field(default=None, metadata={"help": "Output directory to save checkpoints to."})
    save_interval: int = field(default=None, metadata={"help": "Number of iterations between checkpoint saves."})
    eval_interval: int = field(
        default=None, metadata={"help": "Interval between running evaluation on validation set."}
    )
    openmind_model_path: str = field(default=None, metadata={"help": "The path of the Openmind model to be trained."})
    dtype: str = field(default="bf16", metadata={"help": "The dtype mode that the model is running on."})
    plugin_args: dict = field(default=None, metadata={"help": "Parameters related to accelerate plugins."})
    dataloader_config: dict = field(default=None, metadata={"help": "The parameters of dataloader."})
    report_to: str = field(default=None, metadata={"help": "Whom will accelerate report the log to."})
    project_name: str = field(default="accelerate-megatron", metadata={"help": "The name of the project"})
    @staticmethod
    def from_yaml(config_path: str):
        with open(config_path, "r") as file:
            config_data = yaml.safe_load(file)
        return PreTrainingArguments(**config_data)
    def __post_init__(self):
        self.batch_size = self.micro_batch_size * self.gradient_accumulation_steps * self.dp
        if self.data_path is not None and self.megatron_dataset_flag is None:
            raise ValueError(
                "Since you filled in data_path in PreTrainArguments, you have to specify the "
                "megatron_dataset_flag parameter at the same time."
            )
        self.dtype = self.dtype.lower()
        if self.dtype not in _dtype_map:
            raise ValueError(f"Unknown dtype:{self.dtype}. Supported dtypes:{','.join(_dtype_map.keys())}")
        for f in dataclasses.fields(self):
            value = getattr(self, f.name)
            if value:
                if f.type is str:
                    if re.match(r"^[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)$", value):
                        setattr(self, f.name, float(value))
                        print_in_main_process(
                            f"WARNING: PreTrainingArguments transferring the type of {f.name} from str to float!"
                        )
                if f.type is dict:
                    self._scientific_str_to_float(value)
    def get_mixed_precision(self):
        if self.dtype == "fp32":
            return "no"
        return self.dtype
    def get_torch_dtype(self):
        return _dtype_map.get(self.dtype)
    def get_distributed_train_args(self):
        return self.plugin_args.copy()
    def update_distributed_train_args(self, extra_args: dict):
        self.plugin_args.update(extra_args)
    def get_dataloader_config(self):
        return self.dataloader_config.copy()
    def _scientific_str_to_float(self, config_dict: dict):
        for key, value in config_dict.items():
            if isinstance(value, str):
                if re.match(r"^[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)$", value):
                    config_dict[key] = float(value)
                    print_in_main_process(
                        f"WARNING: PreTrainingArguments transferring the type of {key} from str to float!"
                    )
            if isinstance(value, dict):
                self._scientific_str_to_float(value)
--- a/tests/unit/archived/trainer/test_pretrainer.py
+++ b/tests/unit/archived/trainer/test_pretrainer.py
@ -1,238 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 import dataclasses
 from unittest import TestCase, skip
 from unittest.mock import MagicMock
 import pytest
 from tests.utils_for_test import require_torch
 class TestPreTrainerCommon(TestCase):
    @pytest.fixture(scope="function", autouse=True)
    def global_setup(self):
        from openmind import PreTrainer, PreTrainingArguments
        self.pretrain_args = PreTrainingArguments(
            num_training_steps=10,
            micro_batch_size=4,
            dp=1,
            gradient_accumulation_steps=8,
            seq_length=4096,
            megatron_dataset_flag=True,
            data_path="llama2-mt_text_document",
            save_dir="llama-2-7b-hf_save",
            save_interval=10000,
            eval_interval=10000,
            openmind_model_path="hf",
            dtype="bf16",
            plugin_args={
                "tp_degree": 4,
                "pp_degree": 1,
                "num_micro_batches": 8,
                "gradient_clipping": 1.0,
                "use_distributed_optimizer": False,
                "sequence_parallelism": False,
                "other_megatron_args": {
                    "tokenizer_model": "tokenizer.model",
                    "tokenizer_type": "Llama2Tokenizer",
                    "finetune": False,
                    "recompute_granularity": "full",
                    "recompute_method": "block",
                    "recompute_num_layers": 32,
                    "optimizer": "adam",
                    "lr": 1e-5,
                    "min_lr": 1e-6,
                    "adam_beta2": 0.95,
                    "add_bias_linear": False,
                    "async_tensor_model_parallel_allreduce": True,
                    "attention_dropout": 0.0,
                    "attention_softmax_in_fp32": True,
                    "bias_gelu_fusion": False,
                    "ffn_hidden_size": 11008,
                    "hidden_dropout": 0.0,
                    "init_method_std": 0.01,
                    "initial_loss_scale": 65536.0,
                    "lr_decay_style": "cosine",
                    "lr_warmup_fraction": 0.01,
                    "masked_softmax_fusion": False,
                    "normalization": "RMSNorm",
                    "sequence_parallel": True,
                    "split": "100,0,0",
                    "swiglu": True,
                    "untie_embeddings_and_output_weights": True,
                    "use_flash_attn": True,
                    "weight_decay": 0.1,
                    "no_load_optim": True,
                    "no_load_rng": True,
                    "eval_iters": 10000,
                    "position_embedding_type": "rope",
                },
            },
            dataloader_config={
                "data_path": ["llama2-mt_text_document"],
                "seq_length": 4096,
                "micro_batch_size": 4,
                "split": "100,0,0",
                "eval_iters": 10000,
                "tokenizer_model": "tokenizer.model",
                "tokenizer_type": "Llama2Tokenizer",
            },
        )
        self.pretrainer = PreTrainer
        self.obj = MagicMock()
        self.obj.pretrain_args = self.pretrain_args
        self.obj.accelerate = MagicMock()
    def test_init_trackers(self):
        self.obj.accelerator.init_trackers = MagicMock()
        self.pretrainer._init_trackers(self.obj)
        self.obj.accelerator.init_trackers.assert_called_once_with(
            self.obj.pretrain_args.project_name, dataclasses.asdict(self.obj.pretrain_args)
        )
    def test_get_gradient_accumulation_steps(self):
        with self.assertRaises(NotImplementedError):
            self.pretrainer._get_gradient_accumulation_steps(self.obj)
    def test_get_batch_loss_avg(self):
        batch_loss_sum = 100.0
        with self.assertRaises(NotImplementedError):
            self.pretrainer._get_batch_loss_avg(self.obj, batch_loss_sum)
    def test_get_lr(self):
        with self.assertRaises(NotImplementedError):
            self.pretrainer._get_lr(self.obj)
    def test_train(self):
        self.obj.completed_steps = 0
        self.obj.train_dataloader = [MagicMock()]
        self.obj.eval_dataloader = MagicMock()
        self.obj._pre_training = MagicMock()
        self.obj._train_step = MagicMock()
        self.obj._get_lr = MagicMock(return_value=0.001)
        self.obj._get_batch_loss_avg = MagicMock(return_value=0.5)
        self.obj._train_step_log = MagicMock()
        self.obj._save_state = MagicMock()
        self.obj._eval = MagicMock()
        self.obj._post_training = MagicMock()
        self.obj.accelerate.sync_gradients = True
        self.pretrainer.train(self.obj)
        self.assertTrue(self.obj._pre_training.called)
        self.assertTrue(self.obj.accelerator.end_training.called)
        self.assertTrue(self.obj.accelerator.wait_for_everyone.called)
        self.assertTrue(self.obj._post_training.called)
    def test_train_step(self):
        self.obj.model = MagicMock()
        self.obj.optimizer = MagicMock()
        self.obj.lr_scheduler = MagicMock()
        batch = {"input": "data"}
        outputs = self.pretrainer._train_step(self.obj, batch)
        self.obj.model.train.assert_called_once()
        self.obj.accelerator.accumulate.assert_called_once_with(self.obj.model)
        self.obj.model.assert_called_once_with(**batch)
        self.obj.accelerator.backward.assert_called_once_with(outputs.loss)
        self.obj.optimizer.step.assert_called_once()
        self.obj.lr_scheduler.step.assert_called_once()
        self.obj.optimizer.zero_grad.assert_called_once()
        self.assertEqual(outputs, self.obj.model.return_value)
    def test_train_step_log(self):
        loss = 0.123
        lr = 0.001
        elapsed_time = 10.5
        step = 100
        self.pretrainer._train_step_log(self.obj, loss, lr, elapsed_time, step)
        self.obj.accelerator.log.assert_called_with({"train_loss": loss, "learning_rate": lr}, step=step)
    def test_pre_training(self):
        self.obj._print_training_info = MagicMock()
        self.pretrainer._pre_training(self.obj)
        self.obj._print_training_info.assert_called_once()
        self.assertEqual(self.obj.completed_steps, 0)
    def test_post_training(self):
        self.obj._save = MagicMock()
        self.pretrainer._post_training(self.obj)
        self.obj._save.assert_called_once_with(save_dir=self.obj.pretrain_args.save_dir)
    def test_get_eval_loss(self):
        loss = 1.0
        with self.assertRaises(NotImplementedError):
            self.pretrainer._get_eval_loss(self.obj, loss)
    @skip
    def test_eval(self):
        eval_dataloader = [(1, "batch1"), (2, "batch2")]
        completed_steps = 100
        self.obj._eval_log = MagicMock()
        self.pretrainer._eval(self.obj, eval_dataloader, completed_steps)
        self.obj._eval_log.assert_called_once()
    @require_torch
    def test_eval_step(self):
        import torch
        batch = {"input_ids": torch.tensor([[1, 2, 3]]), "attention_mask": torch.tensor([[1, 1, 1]])}
        self.obj.model = MagicMock()
        outputs = self.pretrainer._eval_step(self.obj, batch)
        self.assertTrue(self.obj.model.eval.called)
        self.assertIsNotNone(outputs)
    def test_handle_eval_losses(self):
        losses = [0.1, 0.2]
        with self.assertRaises(NotImplementedError):
            self.pretrainer._handle_eval_losses(self.obj, losses)
    @require_torch
    def test_eval_log(self):
        import torch
        losses = torch.tensor([0.5, 0.3])
        self.obj._handle_eval_losses = MagicMock(return_value=losses)
        self.obj.accelerator.log = MagicMock()
        self.pretrainer._eval_log(self.obj, losses)
        self.assertTrue(self.obj._handle_eval_losses.called)
        self.assertEqual(self.obj.accelerator.log.call_count, 1)
    def test_save_state(self):
        save_dir = "/path/to/save"
        self.obj.accelerator.save_state = MagicMock()
        self.pretrainer._save_state(self.obj, save_dir)
        self.obj.accelerator.save_state.assert_called_once_with(save_dir)
    def test_save(self):
        save_dir = "/path/to/save"
        with self.assertRaises(NotImplementedError):
            self.pretrainer._save(self.obj, save_dir)
    def test_read_model(self):
        with self.assertRaises(NotImplementedError):
            self.pretrainer._read_model(self.obj)
    def test_prepare(self):
        with self.assertRaises(NotImplementedError):
            self.pretrainer._prepare(self.obj)
    def test_make_accelerator(self):
        with self.assertRaises(NotImplementedError):
            self.pretrainer._make_accelerator(self.obj)
--- a/tests/unit/archived/trainer/test_pretrainer_utils.py
+++ b/tests/unit/archived/trainer/test_pretrainer_utils.py
@ -1,40 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 import os
 import logging
 from unittest.mock import patch
 from openmind.utils.logging import get_logger
 from tests.utils_for_test import require_torch
 openmind_logger = get_logger(__name__)
 openmind_logger.setLevel(logging.INFO)
 def test_print_in_main_process_with_local_rank_0(caplog):
    caplog.set_level(logging.INFO)
    with patch.dict(os.environ, {"LOCAL_RANK": "0"}):
        openmind_logger.info("Test message.")
        log_msg = [record.message for record in caplog.records]
        assert "Test message." in log_msg
@require_torch
 def test_print_in_main_process_with_local_rank_1(capsys):
    from openmind.archived.trainers.pretrainer_utils import print_in_main_process
    with patch.dict(os.environ, {"LOCAL_RANK": "1"}):
        print_in_main_process("Test message.")
        captured = capsys.readouterr()
        assert captured.out == ""
--- a/tests/unit/archived/trainer/test_pretraining_arguments.py
+++ b/tests/unit/archived/trainer/test_pretraining_arguments.py
@ -1,68 +0,0 @@
 # Copyright (c) 2024 Huawei Technologies Co., Ltd.
 #
 # openMind is licensed under Mulan PSL v2.
 # You can use this software according to the terms and conditions of the Mulan PSL v2.
 # You may obtain a copy of Mulan PSL v2 at:
 #
 #          http://license.coscl.org.cn/MulanPSL2
 #
 # THIS SOFTWARE IS PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OF ANY KIND,
 # EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT,
 # MERCHANTABILITY OR FIT FOR A PARTICULAR PURPOSE.
 # See the Mulan PSL v2 for more details.
 from unittest import TestCase
 import pytest
 from tests.utils_for_test import require_torch
@require_torch
 class TestPreTrainingArguments(TestCase):
    @pytest.fixture(scope="function", autouse=True)
    def global_setup(self):
        from openmind import PreTrainingArguments
        self.pretrain_args = PreTrainingArguments(
            num_training_steps=1000,
            micro_batch_size=4,
            dp=1,
            gradient_accumulation_steps=8,
            seq_length=2048,
            megatron_dataset_flag=True,
            data_path="DATA_PATH",
            save_dir="SAVE_PATH",
            save_interval=10000,
            eval_interval=0,
            openmind_model_path="BASE_MODEL",
            plugin_args={"lr": 1.23e-4},
            dataloader_config={"batch": 20},
        )
    def test_from_yaml(self):
        config_path = "CONFIG_PATH"
        try:
            self.pretrain_args.from_yaml(config_path)
        except Exception as exception:
            self.assertIsInstance(exception, FileNotFoundError)
    def test_get_torch_dtype(self):
        import torch
        self.assertEqual(self.pretrain_args.get_torch_dtype(), torch.bfloat16)
    def test_get_distributed_train_args(self):
        self.assertEqual(self.pretrain_args.get_distributed_train_args()["lr"], 1.23e-4)
    def test_update_distributed_train_args(self):
        self.pretrain_args.update_distributed_train_args({"tp_degree": 4})
        self.assertEqual(self.pretrain_args.plugin_args["lr"], 1.23e-4)
        self.assertEqual(self.pretrain_args.plugin_args["tp_degree"], 4)
    def test_get_dataloader_config(self):
        self.assertEqual(self.pretrain_args.get_dataloader_config()["batch"], 20)
    def test_scientific_str_to_float(self):
        self.pretrain_args._scientific_str_to_float(self.pretrain_args.plugin_args)
        self.assertEqual(self.pretrain_args.plugin_args["lr"], 0.000123)