# 模型推理 模型推理(Model Inference)是指在机器学习和深度学习中,使用训练好的模型对新的输入数据进行处理,以得到预测结果或决策的过程。推理过程通常涉及以下步骤: 1、输入处理:将新的输入数据(如图像、文本、声音等)格式化或标准化,使其符合模型期望的输入格式。 2、前向传播:输入数据通过网络或模型的结构进行前向传播,这一过程中会涉及参数的加权求和、激活函数处理等操作。 3、输出生成:模型根据前向传播的结果生成输出,输出可以是分类标签、连续值、概率分布等。 4、后处理:在某些情况下,模型的原始输出需要进一步处理或转换,以便更加直观或符合应用需求。 **而使用openMind Library pipeline可以端到端地一键调用AI模型,用户只需对代码进行简单编写,即可完成推理,大幅提升开发效率。** openMind Library `pipeline`方法支持PyTorch和MindSpore两种框架。此外,`pipeline`方法支持多个领域的任务,例如文本生成、文本分类、图像识别等。 本章节将从以下几个方面介绍如何使用`pipeline`加载模型并进行推理: - [openMind Library环境准备](#openmind-library环境准备) - [pipeline基本用法](#pipeline基本用法) - [pipeline参数](#pipeline参数) - [pipeline推理的示例](#pipeline推理的示例) - [pipeline当前支持的推理任务及其默认参数](#pipeline当前支持的推理任务及其默认参数) ## openMind Library环境准备 详细步骤参考[openMind Library安装指南](../install.md)。 ## pipeline基本用法 当前的`pipeline`支持两种框架:PyTorch和MindSpore,在定义`pipeline`时,通过参数`framework`来指定,Pytorch框架为`pt`,MindSpore框架为`ms`。此外,Pytorch框架支持两种`backend`:`transformers`和`diffusers`,MindSpore支持三种`backend`:`mindformers`、`mindone`和`mindnlp`,通过参数`backend`传入。 在openMind Library中,每种框架下的各类推理任务,都有相应的`pipeline`方法。例如,在PyTorch框架下,文本转音频任务可以通过`TextToAudioPipeline`方法来实现。为了简化操作,我们提供了一个通用的`pipeline`方法,支持加载对应任务的方法。 ### 支持的框架 当前,`pipeline`支持以下两种框架: - PyTorch:使用`pt`作为参数`framework`的值。 - MindSpore:使用`ms`作为参数`framework`的值。 ### Backend 支持 此外,不同的框架支持不同的`backend`: - PyTorch框架支持以下两种`backend`: - `transformers` - `diffusers` - MindSpore框架支持以下三种`backend`: - `mindformers` - `mindnlp` - `mindone` 这些`backend`都可以通过`backend`参数来指定。 ### pipeline使用举例 通过配置`task`,`model`,`framework`和`backend`,可以加载对应框架和任务的模型。 1. PyTorch框架下基于`transformers`的文本生成任务: ```python from openmind import pipeline pipe = pipeline( task="text-generation", model="Baichuan/Baichuan2_7b_chat_pt", framework="pt", backend="transformers", trust_remote_code=True, device="npu:0", ) output = pipe("Give three tips for staying healthy.") print(output) ''' 输出: 1. Eat a balanced diet: Ensure that your diet includes a mix of fruits, vegetables, whole grains, lean proteins, and healthy fats. This will provide your body with the essential nutrients it needs to function properly. 2. Stay hydrated: Drink plenty of water throughout the day to help flush out toxins and maintain proper body functions. Avoid drinking too much sugar-sweetened or caffeinated beverages as these can lead to dehydration. 3. Be active: Aim to get at least 150 minutes of moderate-intensity aerobic activity or 75 minutes of vigorous-intensity aerobic activity per week, along with muscle-strengthening activities on two or more days per week. This will help you maintain a healthy weight, improve cardiovascular health, and reduce the risk of chronic diseases. ''' ``` 2. PyTorch框架下基于`diffusers`的文本生成图像任务: ```python from openmind import pipeline from PIL import Image pipe=pipeline( task="text-to-image", model="PyTorch-NPU/stable-diffusion-xl-base-1_0", framework="pt", backend="diffusers", device="npu:0", ) image = pipe("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting") image.save("diffusers.png") ``` ![diffusers](./figures/pipeline_diffusers_dragon.png) 3. MindSpore框架下基于`mindformers`的文本生成任务: ```python from openmind import pipeline import mindspore as ms ms.set_context(mode=0, device_id=0, jit_config={"jit_level": "O0", "infer_boost": "on"}) pipe = pipeline(task="text-generation", model='MindSpore-Lab/Qwen2_5-7B-Instruct', framework='ms', model_kwargs={"use_past": True}, trust_remote_code=True) outputs = pipe("Give me some advice on how to stay healthy.") print(outputs) ``` 4. MindSpore框架下基于`mindnlp`的文本生成任务: ```python from openmind import pipeline generator = pipeline( task="text-generation", model="AI-Research/Qwen2-7B", framework="ms", backend="mindnlp", ) outputs = generator("Give me some advice on how to stay healthy.") print(outputs) ``` 5. MindSpore框架下基于mindone的文本生成图像任务: ```python from openmind import pipeline import mindspore pipe = pipeline( "text-to-image", model="AI-Research/stable-diffusion-3-medium-diffusers", backend="mindone", framework="ms", mindspore_dtype=mindspore.float16, ) image = pipe("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k")[0][0] image.save("mindone.png") ``` ### SiliconDiff推理加速 #### SiliconDiff介绍 SiliconDiff是由硅基流动研发的一款扩散模型加速库,基于领先的扩散模型加速技术,旨在通过结合国内顶尖硬件资源,如昇腾芯片,提供高性能的文生图解决方案。 #### SiliconDiff加速原理 SiliconDiff整体基于`torch compile`+`torch npu`的技术方案,通过自定义的编译器后端支持算子融合、冗余计算消除和JIT优化,同时支持动态形状,切换形状无额外编译开销。 ![Silicondiff](./figures/silicondiff.png) #### SiliconDiff使用 对于diffusers侧的任务,可以使用`use_silicondiff`参数来加速,提升推理的性能。 ```python from openmind import pipeline import torch generator = pipeline(task="text-to-image", model="PyTorch-NPU/stable-diffusion-xl-base-1_0", device="npu:0", torch_dtype=torch.float16, use_silicondiff=True, ) image = generator(prompt="masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting",) ``` silicondiff_npu和PyTorch的对应版本如下,当前silicondiff_npu仅支持PyTorch 2.1.0和Python3.10: | PyTorch版本 | silicondiff_npu版本 | |-------------|---------------------| | 2.1.0 | 2.1.0.post3 | #### 模型支持与性能提升 目前支持的模型,以及对应模型在Atlas 900 A2 PODc服务器上启用SiliconDiff后的性能提升情况如下。 | Model | 未开启SiliconDiff | 开启SiliconDiff | 性能提升 | |-------------|--------------------|--|--| | [Stable Diffusion v1.5](https://modelers.cn/models/PyTorch-NPU/stable_diffusion_v1_5) | 4.20s | 3.80s | 10.62% | | [SD-XL 1.0-base](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-xl-base-1.0) | 9.11s | 8.35s | 9.13% | | [Stable Diffusion v2.1](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-2-1) | 3.90s | 3.46s | 12.64% | #### 精度无损 使用SiliconDiff前后生成的图像对比如下。 **[Stable Diffusion v1.5](https://modelers.cn/models/PyTorch-NPU/stable_diffusion_v1_5)** | Diffusers + Torch-NPU | Diffusers + SiliconDiff-NPU | |----------------------------------|------------------------------| | ![](./figures/sd1_5_1.png) | ![](./figures/sd1_5_2.png) | **[SD-XL 1.0-base](https://modelers.cn/models/PyTorch-NPU/stable-diffusion-xl-base-1.0)** | Diffusers + Torch-NPU | Diffusers + SiliconDiff-NPU | |----------------------------------|------------------------------| | ![](./figures/sdxl_1.png) | ![](./figures/sdxl_2.png) | ### 默认加载 需要注意的时,当一些参数未指定时,`pipeline`会根据已有参数进行默认加载。 - 当只指定`task`参数时,`pipeline`会根据默认的`framework`,`backend`和模型进行加载。 - 当只指定`task`和`framework`时,`pipeline`会根据默认的`backend`和模型进行加载。 - 当只指定`task`,`framework`和`backend`时,`pipeline`会根据默认的模型进行加载。 不同推理任务的默认`framework`,`backend`和模型见[pipeline当前支持的推理任务及其默认参数](#pipeline当前支持的推理任务及其默认参数)。 在使用`pipeline`方式时,可以通过[openMind模型库](https://modelers.cn/)查找适合自己需求的模型。如果找不到合适的模型,开发者可以进行[模型训练](./train/overview.md)。我们鼓励将训练/微调后的模型上传至openMind模型库分享给更多开发者使用,上传方式可参考[模型分享](push_to_hub.md)。 ## pipeline参数 ### 重要参数 #### framework `pipeline`支持PyTorch(pt)和MindSpore(ms)两种框架,并通过`framework`参数来进行指定。以下为运行在MindSpore框架上的`pipeline`实例: ```python from openmind import pipeline text_pipeline_ms = pipeline(task="text-generation", model="MindSpore-Lab/baichuan2_7b_chat", framework='ms') output = text_pipeline_ms("hello!") ``` #### backend PyTorch框架支持以下两种`backend`:`transformers`和`diffusers`。MindSpore框架支持三种`backend`:`mindformers`、`mindone`和`mindnlp`。通过`backend`参数来进行指定。 - 以下为运行在`MindSpore`框架,后端指定为`mindnlp`的`pipeline`实例: ```python from openmind import pipeline text_pipeline_ms = pipeline(task="text-generation", model="AI-Research/Qwen2-7B", framework='ms', backend="mindnlp") output = text_pipeline_ms("Give me some advice on how to stay healthy.") ``` #### device 用户可以通过`device`参数来指定推理任务所在的处理器,当前支持`CPU`、`NPU`类型的处理器。如果不指定`device`参数,`pipeline`将会自动选取处理器。无论选择哪种处理器,在PyTorch框架和MindSpore框架上都可以正常运行。以下为运行在各处理器上的示例: - 指定在CPU上 ```python generator = pipeline(task="text-generation", device="cpu") ``` - 指定在NPU上 ```python # PyTorch generator = pipeline(task="text-generation", device="npu:0") ``` #### model和tokenizer `model`参数除了支持传入模型地址,也支持传入实例化的模型对象来进行推理,`model`传入实例化的模型对象时,`tokenizer`也必须传入特定的实例化对象: ```python from openmind import pipeline from openmind import AutoModelForSequenceClassification, AutoTokenizer # 创建模型对象,并进行推理 model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english") tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english") text_classifier = pipeline(task="text-classification", model=model, tokenizer=tokenizer, framework="pt") outputs = text_classifier("This is great !") # [{'label': 'POSITIVE', 'score': 0.9998694658279419}] ``` #### use_silicondiff 对于`diffusers`侧的任务,可以使用`use_silicondiff`参数来加速,提升推理的性能。 ```python from openmind import pipeline import torch generator = pipeline(task="text-to-image", model="PyTorch-NPU/stable-diffusion-xl-base-1_0", device="npu:0", torch_dtype=torch.float16, use_silicondiff=True, ) image = generator("masterpiece, best quality, Cute dragon creature, pokemon style, night, moonlight, dim lighting") ``` ### 特定参数 pipeline提供了特定参数进行模型推理,可允许单独配置,以帮助用户完成工作。例如,对于文本生成任务,可以通过指定`max_new_tokens`和`num_beams`参数来控制生成的文本长度和生成的beam大小,以影响生成的结果: ```python from openmind import pipeline # 设置特定任务参数 params = { "max_new_tokens": 50, # 生成的文本长度限制为50个token "num_beams": 5 # 使用beam search算法生成文本,beam大小为5 } text_generator = pipeline("text-generation", device="npu:0", trust_remote_code=True, **params) generated_text = text_generator("Once upon a time,") print(generated_text) ''' 输出: Once upon a time, there was a small village nestled between two mountains. The villagers lived simple lives, working the land and taking care of their families. One day, a stranger arrived in the village. He was a wise old man with a long white beard and a ro ''' ``` ### 全量参数 `pipeline`的全量参数可以参考[Pipeline API接口](../api_reference/apis/pipeline_api.md) ## pipeline当前支持的推理任务及其默认参数 ### `pipeline`任务的默认框架、框架默认`backend`、当前支持的`backend` | 任务名称 | 默认框架 | PyTorch默认backend | MindSpore默认backend | 当前支持的backend | |-----------------------|-----------|--------------------|--------------------|----------------------------------| | text-classification | PyTorch | transformers | | transformers | | text-to-image | PyTorch | diffusers | mindone | diffusers、mindone | | visual-question-answering | PyTorch | transformers | | transformers | | zero-shot-object-detection | PyTorch | transformers | | transformers | | zero-shot-classification | PyTorch | transformers | | transformers | | depth-estimation | PyTorch | transformers | | transformers | | image-to-image | PyTorch | transformers | | transformers | | mask-generation | PyTorch | transformers | | transformers | | text-generation | PyTorch | transformers | mindformers | transformers、mindformers、mindnlp | | zero-shot-image-classification | PyTorch | transformers | | transformers | | feature-extraction | PyTorch | transformers | | transformers | | image-classification | PyTorch | transformers | | transformers | | image-to-text | PyTorch | transformers | | transformers | | text2text-generation | PyTorch | transformers | | transformers | | token-classification | PyTorch | transformers | | transformers | | fill-mask | PyTorch | transformers | | transformers | | question-answering | PyTorch | transformers | | transformers | | summarization | PyTorch | transformers | | transformers | | table-question-answering | PyTorch | transformers | | transformers | | translation | PyTorch | transformers | | transformers | ### PyTorch当前支持的推理任务与默认模型 #### transformers backend | 任务名称 | 默认模型 | |--------------------------------|---------------------------------------------------------------| | text-classification | "PyTorch-NPU/distilbert_base_uncased_finetuned_sst_2_english" | | text-generation | "Baichuan/Baichuan2_7b_chat_pt" | | question-answering | "PyTorch-NPU/roberta_base_squad2" | | table-question-answering | "PyTorch-NPU/tapas_base_finetuned_wtq" | | fill-mask | "PyTorch-NPU/bert_base_uncased" | | summarization | "PyTorch-NPU/bart_large_cnn" | | zero-shot-image-classification | "PyTorch-NPU/siglip_so400m_patch14_384" | | feature-extraction | "PyTorch-NPU/xlnet_base_cased" | | depth-estimation | "PyTorch-NPU/dpt_large" | | image-classification | "PyTorch-NPU/beit_base_patch16_224" | | image-to-image | "PyTorch-NPU/swin2SR_classical_sr_x2_64" | | image-to-text | "PyTorch-NPU/blip-image-captioning-large" | | mask-generation | "PyTorch-NPU/sam_vit_base" | | text2text-generation | "PyTorch-NPU/flan_t5_base" | | zero-shot-classification | "PyTorch-NPU/deberta_v3_large_zeroshot_v2.0" | | zero-shot-object-detection | "PyTorch-NPU/owlvit_base_patch32" | | token-classification | "PyTorch-NPU/camembert_ner" | | translation | "PyTorch-NPU/t5_base" | | visual-question-answering | "PyTorch-NPU/blip_vqa_base" | #### diffusers backend | 任务名称 | 默认模型 | |--------------------------------|--------------------------------------------| | text-to-image | "PyTorch-NPU/stable-diffusion-xl-base-1_0" | ### MindSpore当前支持的推理任务 #### mindformers backend | 任务名 | 默认模型 | |--------------------------------|-----------------------------| | text-generation |"MindSpore-Lab/Qwen2_5-7B-Instruct" | 【注意】使用mindformers进行MindSpore模型推理时,device的内存必须大于等于64GB。 #### mindnlp backend | 任务名 | 默认模型 | |--------------------------------|-----------------------------| | text-generation |"AI-Research/Qwen2-7B" | #### mindone backend | 任务名 | 默认模型 | |-----------------------|-----------------------------| | text-to-image |"AI-Research/stable-diffusion-3-medium-diffusers" |