# 模型评估

## 概述

openMind Library提供了一系列评估指标，支持用户快速方便地在本地加载这些指标。当前支持的评估指标包括：

- 准确率（Accuracy）
- 混淆矩阵（ConfusionMatrix）
- 精确匹配（ExactMatch）
- F1分数（F1）
- GLUE基准（Glue）
- 平均绝对误差（MAE）
- 均方误差（MSE）

## 功能特点

- **快速加载**：用户可以快速加载所需的评估指标类。
- **本地计算**：所有计算均在本地执行，确保速度和响应性。
- **简洁接口**：提供简洁明了的方法，简化集成流程。

## 计算方法

具体的使用方法可以参考下文的[评估指标的具体使用方法](#评估指标的具体使用方法)。

### 一体式计算

该方式使用`compute()`方法直接调用，只需要将输入传入即可。

### 增量计算

该方式可以通过迭代方式构建，例如使用for循环。您可以通过调用`add()`方法将输入多次传入，最后使用`evaluate()`对所有传入的输入进行计算。

## 在Trainer中使用

首先构造模型、数据集、`tokenizer`并预处理数据集：

```python
from openmind import OmDataset, AutoModelForSequenceClassification, AutoTokenizer

dataset = OmDataset.load_dataset("AI_Connect/glue", "cola")
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/bert_base_cased")
model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/bert_base_cased")

def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["validation"].shuffle(seed=42).select(range(1000))
```

构造`Trainer`并传入`metrics`类：

```python
from openmind import TrainingArguments, Trainer, metrics
import numpy as np

# 在4.51.3版本的transformers中，evaluation_strategy参数已更名为eval_strategy, 参见https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/training_args.py#L239
training_args = TrainingArguments(output_dir="test_trainer", eval_strategy="epoch")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    accuracy = metrics.Accuracy()
    return accuracy.compute(preds=preds, labels=labels)

trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = small_train_dataset,
    eval_dataset = small_eval_dataset,
    compute_metrics = compute_metrics,
)
trainer.train()
```

## 评估指标的具体使用方法

### 准确率（Accuracy）

```python
from openmind import metrics

# 一体式计算
accuracy = metrics.Accuracy()
print(accuracy.compute(preds=[0, 1, 2], labels=[0, 1, 1]))

'''
{'accuracy': 0.6666666666666666}
'''

# 增量计算
for pred, label in zip([[0, 1], [0, 1], [0, 0]], [[1, 0], [0, 1], [0, 1]]):
    accuracy.add(preds=pred, labels=label)
print(accuracy.evaluate())

'''
{'accuracy': 0.5}
'''
```

### 混淆矩阵（ConfusionMatrix）

```python
from openmind import metrics

# 一体式计算
cm = metrics.ConfusionMatrix()
print(cm.compute(preds=[0, 1, 2, 0, 1, 2], labels=[0, 1, 1, 2, 1, 0]))

'''
{'confusion_matrix': array([[1, 0, 1],
                           [0, 2, 0],
                           [1, 1, 0]])}
'''

# 增量计算
cm.add(preds=[0, 1, 2], labels=[0, 1, 1])
cm.add(preds=[0, 1, 2], labels=[2, 1, 0])
print(cm.evaluate())

'''
{'confusion_matrix': array([[1, 0, 1],
                           [0, 2, 0],
                           [1, 1, 0]])}
'''
```

### 精确匹配（ExactMatch）

`ExactMatch`计算指标的更多接口信息可参考[评估指标接口-ExactMatch](../api_reference/apis/metrics_api.md#exactmatch)。

```python
from openmind import metrics

# 一体式计算
exact_match = metrics.ExactMatch()
preds = ["the cat", "theater", "Green", "npu"]
labels = ["the dog", "theater", "green", "npu"]
print(exact_match.compute(preds=preds, labels=labels))

'''
{'exact_match': 0.5}
'''

# 增量计算
exact_match.add(preds=["NPU", "arm"], labels=["npu", "arm"])
exact_match.add(preds=["windows", "linux"], labels=["windows", "linux"])
print(exact_match.evaluate())

'''
{'exact_match': 0.75}
'''
```

### F1分数（F1）

```python
from openmind import metrics

# 一体式计算
f1 = metrics.F1()
print(f1.compute(preds=[0, 1, 0, 1, 0], labels=[0, 0, 1, 1, 0]))

'''
{'f1': 0.5}
'''

# 增量计算
f1.add(preds=[0, 1], labels=[0, 0])
f1.add(preds=[0, 1, 0], labels=[1, 1, 0])
print(f1.evaluate())

'''
{'f1': 0.5}
'''
```

### GLUE基准（Glue）

`Glue`计算指标的更多接口信息可参考[评估指标接口-Glue](../api_reference/apis/metrics_api.md#glue)。

```python
from openmind import metrics

# 一体式计算
glue = metrics.Glue("sst2")
print(glue.compute(preds=[0, 1], labels=[0, 1]))

'''
{'accuracy': 1.0}
'''

# 增量计算
glue.add(preds=[0, 1], labels=[0, 0])
glue.add(preds=[0, 1], labels=[1, 1])
print(glue.evaluate())

'''
{'accuracy': 0.5}
'''
```

### 平均绝对误差（MAE）

```python
from openmind import metrics

# 一体式计算
mae = metrics.MAE()
print(mae.compute(preds=[0, 1], labels=[0, 1]))

'''
{'mae': 0.0}
'''

# 增量计算
mae.add(preds=[0, 1], labels=[0, 0])
mae.add(preds=[0, 1], labels=[1, 1])
print(mae.evaluate())

'''
{'mae': 0.5}
'''
```

### 均方误差（MSE）

```python
from openmind import metrics

# 一体式计算
mse = metrics.MSE()
print(mse.compute(preds=[0, 1], labels=[0, 1]))

'''
{'mse': 0.0}
'''

# 增量计算
mse.add(preds=[0, 1], labels=[0, 0])
mse.add(preds=[0, 1], labels=[1, 1])
print(mse.evaluate())

'''
{'mse': 0.5}
'''
```