# 模型推理

模型推理（Model Inference）是指在机器学习和深度学习中，使用训练好的模型对新的输入数据进行处理，以得到预测结果或决策的过程。推理过程通常涉及以下步骤： 

1、输入处理：将新的输入数据（如图像、文本、声音等）格式化或标准化，使其符合模型期望的输入格式。

2、前向传播：输入数据通过网络或模型的结构进行前向传播，这一过程中会涉及参数的加权求和、激活函数处理等操作。

3、输出生成：模型根据前向传播的结果生成输出，输出可以是分类标签、连续值、概率分布等。

4、后处理：在某些情况下，模型的原始输出需要进一步处理或转换，以便更加直观或符合应用需求。

**而使用openMind Library pipeline可以端到端地一键调用AI模型，用户只需对代码进行简单编写，即可完成推理，大幅提升开发效率。**

openMind Library `pipeline`方法支持PyTorch和MindSpore两种框架。此外，`pipeline`方法支持多个领域的任务，例如文本生成、文本分类、图像识别等。

本章节将从以下几个方面介绍如何使用`pipeline`加载模型并进行推理：

- [openMind Library环境准备](#openmind-library环境准备)

- [pipeline基本用法](#pipeline基本用法)

- [pipeline参数](#pipeline参数)

- [pipeline推理的示例](#pipeline推理的示例)

- [pipeline当前支持的推理任务及其默认参数](#pipeline当前支持的推理任务及其默认参数)

<!-- omit in toc -->
## openMind Library环境准备

详细步骤参考[openMind Library安装指南](../install.md)。

<!-- omit in toc -->
## pipeline基本用法

当前的`pipeline`支持两种框架：PyTorch和MindSpore，在定义`pipeline`时，通过参数`framework`来指定，Pytorch框架为`pt`，MindSpore框架为`ms`。此外，Pytorch框架支持两种`backend`：`transformers`和`diffusers`，MindSpore支持三种`backend`：`mindformers`、`mindone`和`mindnlp`，通过参数`backend`传入。

在openMind Library中，每种框架下的各类推理任务，都有相应的`pipeline`方法。例如，在PyTorch框架下，文本转音频任务可以通过`TextToAudioPipeline`方法来实现。为了简化操作，我们提供了一个通用的`pipeline`方法，支持加载对应任务的方法。

<!-- omit in toc -->
### 支持的框架

当前，`pipeline`支持以下两种框架：

- PyTorch：使用`pt`作为参数`framework`的值。
- MindSpore：使用`ms`作为参数`framework`的值。

<!-- omit in toc -->
### Backend 支持

此外，不同的框架支持不同的`backend`：

- PyTorch框架支持以下两种`backend`：
    - `transformers`
    - `diffusers`

- MindSpore框架支持以下三种`backend`：
    - `mindformers`
    - `mindnlp`
    - `mindone`

这些`backend`都可以通过`backend`参数来指定。

<!-- omit in toc -->
### pipeline使用举例

通过配置`task`,`model`,`framework`和`backend`，可以加载对应框架和任务的模型。

1. PyTorch框架下基于`transformers`的文本生成任务：

    ```python
    from openmind import pipeline
   
    pipe = pipeline(
        task="text-generation",
        model="Baichuan/Baichuan2_7b_chat_pt",
        framework="pt",
        backend="transformers",
        trust_remote_code=True,
        device="npu:0",
    )
    output = pipe("Give three tips for staying healthy.")
    print(output)
     
    '''
    输出:
    1. Eat a balanced diet: Ensure that your diet includes a mix of fruits, vegetables, whole grains, lean proteins, and healthy fats. This will provide your body with the essential nutrients it needs to function properly.
    2. Stay hydrated: Drink plenty of water throughout the day to help flush out toxins and maintain proper body functions. Avoid drinking too much sugar-sweetened or caffeinated beverages as these can lead to dehydration.
    3. Be active: Aim to get at least 150 minutes of moderate-intensity aerobic activity or 75 minutes of vigorous-intensity aerobic activity per week, along with muscle-strengthening activities on two or more days per week. This will help you maintain a healthy weight, improve cardiovascular health, and reduce the risk of chronic diseases.
    '''
    ```

2. PyTorch框架下基于`diffusers`的文本生成图像任务：

    ```python
    from openmind import pipeline
    from PIL import Image

    pipe=pipeline(
        task="text-to-image",
        model="PyTorch-NPU/stable-diffusion-xl-base-1_0",
        framework="pt",
        backend="diffusers",
        device="npu:0",
    )
    image = pipe("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting")
    image.save("diffusers.png")
    ```

    ![diffusers](./figures/pipeline_diffusers_dragon.png)

3. MindSpore框架下基于`mindformers`的文本生成任务：

    ```python
    from openmind import pipeline
    import mindspore as ms
    
    ms.set_context(mode=0, device_id=0, jit_config={"jit_level": "O0", "infer_boost": "on"}) 

    pipe = pipeline(task="text-generation",
                    model='MindSpore-Lab/Qwen2_5-7B-Instruct',
                    framework='ms',
                    model_kwargs={"use_past": True},
                    trust_remote_code=True)
    outputs = pipe("Give me some advice on how to stay healthy.")
    print(outputs)
    ```

4. MindSpore框架下基于`mindnlp`的文本生成任务：

    ```python
    from openmind import pipeline

    generator = pipeline(
        task="text-generation",
        model="AI-Research/Qwen2-7B",
        framework="ms",
        backend="mindnlp",
    )
    outputs = generator("Give me some advice on how to stay healthy.")
    print(outputs)
    ```

5. MindSpore框架下基于mindone的文本生成图像任务：

    ```python
    from openmind import pipeline
    import mindspore
    
    pipe = pipeline(
        "text-to-image",
        model="AI-Research/stable-diffusion-3-medium-diffusers",
        backend="mindone",
        framework="ms",
        mindspore_dtype=mindspore.float16,
    )
    image = pipe("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k")[0][0]
    image.save("mindone.png")
    ```

<!-- omit in toc -->
### SiliconDiff推理加速

<!-- omit in toc -->
#### SiliconDiff介绍

SiliconDiff是由硅基流动研发的一款扩散模型加速库，基于领先的扩散模型加速技术，旨在通过结合国内顶尖硬件资源，如昇腾芯片，提供高性能的文生图解决方案。

<!-- omit in toc -->
#### SiliconDiff加速原理

SiliconDiff整体基于`torch compile`+`torch npu`的技术方案，通过自定义的编译器后端支持算子融合、冗余计算消除和JIT优化，同时支持动态形状，切换形状无额外编译开销。

![Silicondiff](./figures/silicondiff.png)

<!-- omit in toc -->
#### SiliconDiff使用

对于diffusers侧的任务，可以使用`use_silicondiff`参数来加速，提升推理的性能。

```python
from openmind import pipeline
import torch

generator = pipeline(task="text-to-image", 
                     model="PyTorch-NPU/stable-diffusion-xl-base-1_0", 
                     device="npu:0",
                     torch_dtype=torch.float16,
                     use_silicondiff=True,
                     )

image = generator(prompt="masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting",)
```

silicondiff_npu和PyTorch的对应版本如下，当前silicondiff_npu仅支持PyTorch 2.1.0和Python3.10：

| PyTorch版本 | silicondiff_npu版本  |
|-------------|---------------------|
| 2.1.0       | 2.1.0.post3         |

<!-- omit in toc -->
#### 模型支持与性能提升

目前支持的模型，以及对应模型在Atlas 900 A2 PODc服务器上启用SiliconDiff后的性能提升情况如下。

| Model       | 未开启SiliconDiff  |  开启SiliconDiff  |  性能提升  |
|-------------|--------------------|--|--|
| [Stable Diffusion v1.5](https://modelers.cn/models/PyTorch-NPU/stable_diffusion_v1_5)  | 4.20s  | 3.80s | 10.62% |
| [SD-XL 1.0-base](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-xl-base-1.0)  | 9.11s | 8.35s | 9.13% |
| [Stable Diffusion v2.1](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-2-1)   | 3.90s  | 3.46s | 12.64% |

<!-- omit in toc -->
#### 精度无损

使用SiliconDiff前后生成的图像对比如下。

**[Stable Diffusion v1.5](https://modelers.cn/models/PyTorch-NPU/stable_diffusion_v1_5)**

| Diffusers + Torch-NPU            | Diffusers + SiliconDiff-NPU  |
|----------------------------------|------------------------------|
| ![](./figures/sd1_5_1.png)       | ![](./figures/sd1_5_2.png)   |

**[SD-XL 1.0-base](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-xl-base-1.0)**

| Diffusers + Torch-NPU            | Diffusers + SiliconDiff-NPU  |
|----------------------------------|------------------------------|
| ![](./figures/sdxl_1.png)       | ![](./figures/sdxl_2.png)   |

<!-- omit in toc -->
### 默认加载

需要注意的时，当一些参数未指定时，`pipeline`会根据已有参数进行默认加载。

- 当只指定`task`参数时，`pipeline`会根据默认的`framework`,`backend`和模型进行加载。
- 当只指定`task`和`framework`时，`pipeline`会根据默认的`backend`和模型进行加载。
- 当只指定`task`，`framework`和`backend`时，`pipeline`会根据默认的模型进行加载。

不同推理任务的默认`framework`,`backend`和模型见[pipeline当前支持的推理任务及其默认参数](#pipeline当前支持的推理任务及其默认参数)。

在使用`pipeline`方式时，可以通过[openMind模型库](https://modelers.cn/)查找适合自己需求的模型。如果找不到合适的模型，开发者可以进行[模型训练](./train/overview.md)。我们鼓励将训练/微调后的模型上传至openMind模型库分享给更多开发者使用，上传方式可参考[模型分享](push_to_hub.md)。

<!-- omit in toc -->
## pipeline参数

<!-- omit in toc -->
### 重要参数

<!-- omit in toc -->
#### framework

`pipeline`支持PyTorch(pt)和MindSpore(ms)两种框架，并通过`framework`参数来进行指定。以下为运行在MindSpore框架上的`pipeline`实例：

```python
from openmind import pipeline

text_pipeline_ms = pipeline(task="text-generation", model="MindSpore-Lab/baichuan2_7b_chat", framework='ms')
output = text_pipeline_ms("hello!")
```

<!-- omit in toc -->
#### backend

PyTorch框架支持以下两种`backend`：`transformers`和`diffusers`。MindSpore框架支持三种`backend`：`mindformers`、`mindone`和`mindnlp`。通过`backend`参数来进行指定。

- 以下为运行在`MindSpore`框架，后端指定为`mindnlp`的`pipeline`实例：

```python
from openmind import pipeline

text_pipeline_ms = pipeline(task="text-generation", model="AI-Research/Qwen2-7B", framework='ms', backend="mindnlp")
output = text_pipeline_ms("Give me some advice on how to stay healthy.")
```

<!-- omit in toc -->
#### device

用户可以通过`device`参数来指定推理任务所在的处理器，当前支持`CPU`、`NPU`类型的处理器。如果不指定`device`参数，`pipeline`将会自动选取处理器。无论选择哪种处理器，在PyTorch框架和MindSpore框架上都可以正常运行。以下为运行在各处理器上的示例：

- 指定在CPU上

```python
generator = pipeline(task="text-generation", device="cpu")
```

- 指定在NPU上

```python
# PyTorch
generator = pipeline(task="text-generation", device="npu:0")
```

<!-- omit in toc -->
#### model和tokenizer

`model`参数除了支持传入模型地址，也支持传入实例化的模型对象来进行推理，`model`传入实例化的模型对象时，`tokenizer`也必须传入特定的实例化对象：

```python
from openmind import pipeline
from openmind import AutoModelForSequenceClassification, AutoTokenizer

# 创建模型对象，并进行推理
model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english")
text_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer, framework="pt")

outputs = text_classifier("This is great !")
# [{'label': 'POSITIVE', 'score': 0.9998694658279419}]
```

<!-- omit in toc -->
#### use_silicondiff

对于`diffusers`侧的任务，可以使用`use_silicondiff`参数来加速，提升推理的性能。

```python
from openmind import pipeline
import torch

generator = pipeline(task="text-to-image", 
                     model="PyTorch-NPU/stable-diffusion-xl-base-1_0", 
                     device="npu:0",
                     torch_dtype=torch.float16,
                     use_silicondiff=True,
                     )

image = generator("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting")
```

<!-- omit in toc -->
### 特定参数

pipeline提供了特定参数进行模型推理，可允许单独配置，以帮助用户完成工作。例如，对于文本生成任务，可以通过指定`max_new_tokens`和`num_beams`参数来控制生成的文本长度和生成的beam大小，以影响生成的结果：

```python
from openmind import pipeline

# 设置特定任务参数
params = {
    "max_new_tokens": 50,  # 生成的文本长度限制为50个token
    "num_beams": 5  # 使用beam search算法生成文本，beam大小为5
}

text_generator = pipeline("text-generation", device="npu:0", trust_remote_code=True, **params)
generated_text = text_generator("Once upon a time,")
print(generated_text)

'''
输出：
Once upon a time, there was a small village nestled between two mountains. The villagers lived simple lives, working the land and taking care of their families. One day, a stranger arrived in the village. He was a wise old man with a long white beard and a ro
'''
```

<!-- omit in toc -->
### 全量参数

`pipeline`的全量参数可以参考[Pipeline API接口](../api_reference/apis/pipeline_api.md)

<!-- omit in toc -->
## pipeline当前支持的推理任务及其默认参数

<!-- omit in toc -->
### `pipeline`任务的默认框架、框架默认`backend`、当前支持的`backend`

| 任务名称               | 默认框架  | PyTorch默认backend | MindSpore默认backend | 当前支持的backend                     |
|-----------------------|-----------|--------------------|--------------------|----------------------------------|
| text-classification   | PyTorch  | transformers      |                    | transformers                     |
| text-to-image         | PyTorch  | diffusers         | mindone            | diffusers、mindone                |
| visual-question-answering | PyTorch  | transformers |                    | transformers                     |
| zero-shot-object-detection | PyTorch | transformers |                    | transformers                     |
| zero-shot-classification  | PyTorch | transformers |                    | transformers                     |
| depth-estimation     |  PyTorch | transformers |                    | transformers                     |
| image-to-image       |  PyTorch | transformers |                    | transformers                     |
| mask-generation      |  PyTorch | transformers |                    | transformers                     |
| text-generation      |  PyTorch | transformers | mindformers        | transformers、mindformers、mindnlp |
| zero-shot-image-classification | PyTorch  | transformers      |                    | transformers                     |
| feature-extraction | PyTorch  | transformers      |                    | transformers                     |
| image-classification | PyTorch  | transformers      |                    | transformers                     |
| image-to-text | PyTorch  | transformers      |                    | transformers                     |
| text2text-generation | PyTorch  | transformers      |                    | transformers                     |
| token-classification | PyTorch  | transformers      |                    | transformers                     |
| fill-mask | PyTorch  | transformers      |                    | transformers                     |
| question-answering | PyTorch  | transformers      |                    | transformers                     |
| summarization | PyTorch  | transformers      |                    | transformers                     |
| table-question-answering | PyTorch  | transformers      |                    | transformers                     |
| translation | PyTorch  | transformers      |                    | transformers                     |

<!-- omit in toc -->
### PyTorch当前支持的推理任务与默认模型

<!-- omit in toc -->
#### transformers backend

| 任务名称                        | 默认模型                                                       |
|--------------------------------|---------------------------------------------------------------|
| text-classification            | "PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english" |
| text-generation                | "Baichuan/Baichuan2_7b_chat_pt"                               |
| question-answering             | "PyTorch-NPU/roberta_base_squad2"                             |
| table-question-answering       | "PyTorch-NPU/tapas_base_finetuned_wtq"                        |
| fill-mask                      | "PyTorch-NPU/bert_base_uncased"                               |
| summarization                  | "PyTorch-NPU/bart_large_cnn"                                  |
| zero-shot-image-classification | "PyTorch-NPU/siglip_so400m_patch14_384"                       |
| feature-extraction             | "PyTorch-NPU/xlnet_base_cased"                                |
| depth-estimation               | "PyTorch-NPU/dpt_large"                                       |
| image-classification           | "PyTorch-NPU/beit_base_patch16_224"                           |
| image-to-image                 | "PyTorch-NPU/swin2SR_classical_sr_x2_64"                      |
| image-to-text                  | "PyTorch-NPU/blip-image-captioning-large"                     |
| mask-generation                | "PyTorch-NPU/sam_vit_base"                                    |
| text2text-generation           | "PyTorch-NPU/flan_t5_base"                                    |
| zero-shot-classification       | "PyTorch-NPU/deberta_v3_large_zeroshot_v2.0"                  |
| zero-shot-object-detection     | "PyTorch-NPU/owlvit_base_patch32"                             |
| token-classification           | "PyTorch-NPU/camembert_ner"                                   |
| translation                    | "PyTorch-NPU/t5_base"                                         |
| visual-question-answering      | "PyTorch-NPU/blip_vqa_base"                                   |

<!-- omit in toc -->
#### diffusers backend

| 任务名称                        | 默认模型                                   |
|--------------------------------|--------------------------------------------|
| text-to-image                  | "PyTorch-NPU/stable-diffusion-xl-base-1_0" |

<!-- omit in toc -->
### MindSpore当前支持的推理任务

<!-- omit in toc -->
#### mindformers backend

| 任务名                          | 默认模型                    |
|--------------------------------|-----------------------------| 
| text-generation                |"MindSpore-Lab/Qwen2_5-7B-Instruct"     |

【注意】使用mindformers进行MindSpore模型推理时，device的内存必须大于等于64GB。

<!-- omit in toc -->
#### mindnlp backend

| 任务名                          | 默认模型                    |
|--------------------------------|-----------------------------| 
| text-generation                |"AI-Research/Qwen2-7B"     |

<!-- omit in toc -->
#### mindone backend

| 任务名                 | 默认模型                    |
|-----------------------|-----------------------------| 
| text-to-image         |"AI-Research/stable-diffusion-3-medium-diffusers"     |