Files

幽若 d6d874b0ec !241 修正文档中关于evaluation_strategy更名为eval_strategy的问题

Merge pull request !241 from 幽若/master-0616

2025-06-17 08:17:53 +00:00

5.6 KiB

Raw Blame History

模型评估

概述

openMind Library提供了一系列评估指标，支持用户快速方便地在本地加载这些指标。当前支持的评估指标包括：

准确率（Accuracy）
混淆矩阵（ConfusionMatrix）
精确匹配（ExactMatch）
F1分数（F1）
GLUE基准（Glue）
平均绝对误差（MAE）
均方误差（MSE）

功能特点

快速加载：用户可以快速加载所需的评估指标类。
本地计算：所有计算均在本地执行，确保速度和响应性。
简洁接口：提供简洁明了的方法，简化集成流程。

计算方法

具体的使用方法可以参考下文的评估指标的具体使用方法。

一体式计算

该方式使用compute()方法直接调用，只需要将输入传入即可。

增量计算

该方式可以通过迭代方式构建，例如使用for循环。您可以通过调用add()方法将输入多次传入，最后使用evaluate()对所有传入的输入进行计算。

在Trainer中使用

首先构造模型、数据集、tokenizer并预处理数据集：

from openmind import OmDataset, AutoModelForSequenceClassification, AutoTokenizer

dataset = OmDataset.load_dataset("AI_Connect/glue", "cola")
tokenizer = AutoTokenizer.from_pretrained("PyTorch-NPU/bert_base_cased")
model = AutoModelForSequenceClassification.from_pretrained("PyTorch-NPU/bert_base_cased")

def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["validation"].shuffle(seed=42).select(range(1000))

构造Trainer并传入metrics类：

from openmind import TrainingArguments, Trainer, metrics
import numpy as np

# 在4.51.3版本的transformers中，evaluation_strategy参数已更名为eval_strategy, 参见https://github.com/huggingface/transformers/blob/v4.51.3/src/transformers/training_args.py#L239
training_args = TrainingArguments(output_dir="test_trainer", eval_strategy="epoch")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    accuracy = metrics.Accuracy()
    return accuracy.compute(preds=preds, labels=labels)

trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = small_train_dataset,
    eval_dataset = small_eval_dataset,
    compute_metrics = compute_metrics,
)
trainer.train()

评估指标的具体使用方法

准确率（Accuracy）

from openmind import metrics

# 一体式计算
accuracy = metrics.Accuracy()
print(accuracy.compute(preds=[0, 1, 2], labels=[0, 1, 1]))

'''
{'accuracy': 0.6666666666666666}
'''

# 增量计算
for pred, label in zip([[0, 1], [0, 1], [0, 0]], [[1, 0], [0, 1], [0, 1]]):
    accuracy.add(preds=pred, labels=label)
print(accuracy.evaluate())

'''
{'accuracy': 0.5}
'''

混淆矩阵（ConfusionMatrix）

from openmind import metrics

# 一体式计算
cm = metrics.ConfusionMatrix()
print(cm.compute(preds=[0, 1, 2, 0, 1, 2], labels=[0, 1, 1, 2, 1, 0]))

'''
{'confusion_matrix': array([[1, 0, 1],
                           [0, 2, 0],
                           [1, 1, 0]])}
'''

# 增量计算
cm.add(preds=[0, 1, 2], labels=[0, 1, 1])
cm.add(preds=[0, 1, 2], labels=[2, 1, 0])
print(cm.evaluate())

'''
{'confusion_matrix': array([[1, 0, 1],
                           [0, 2, 0],
                           [1, 1, 0]])}
'''

精确匹配（ExactMatch）

ExactMatch计算指标的更多接口信息可参考评估指标接口-ExactMatch。

from openmind import metrics

# 一体式计算
exact_match = metrics.ExactMatch()
preds = ["the cat", "theater", "Green", "npu"]
labels = ["the dog", "theater", "green", "npu"]
print(exact_match.compute(preds=preds, labels=labels))

'''
{'exact_match': 0.5}
'''

# 增量计算
exact_match.add(preds=["NPU", "arm"], labels=["npu", "arm"])
exact_match.add(preds=["windows", "linux"], labels=["windows", "linux"])
print(exact_match.evaluate())

'''
{'exact_match': 0.75}
'''

F1分数（F1）

from openmind import metrics

# 一体式计算
f1 = metrics.F1()
print(f1.compute(preds=[0, 1, 0, 1, 0], labels=[0, 0, 1, 1, 0]))

'''
{'f1': 0.5}
'''

# 增量计算
f1.add(preds=[0, 1], labels=[0, 0])
f1.add(preds=[0, 1, 0], labels=[1, 1, 0])
print(f1.evaluate())

'''
{'f1': 0.5}
'''

GLUE基准（Glue）

Glue计算指标的更多接口信息可参考评估指标接口-Glue。

from openmind import metrics

# 一体式计算
glue = metrics.Glue("sst2")
print(glue.compute(preds=[0, 1], labels=[0, 1]))

'''
{'accuracy': 1.0}
'''

# 增量计算
glue.add(preds=[0, 1], labels=[0, 0])
glue.add(preds=[0, 1], labels=[1, 1])
print(glue.evaluate())

'''
{'accuracy': 0.5}
'''

平均绝对误差（MAE）

from openmind import metrics

# 一体式计算
mae = metrics.MAE()
print(mae.compute(preds=[0, 1], labels=[0, 1]))

'''
{'mae': 0.0}
'''

# 增量计算
mae.add(preds=[0, 1], labels=[0, 0])
mae.add(preds=[0, 1], labels=[1, 1])
print(mae.evaluate())

'''
{'mae': 0.5}
'''

均方误差（MSE）

from openmind import metrics

# 一体式计算
mse = metrics.MSE()
print(mse.compute(preds=[0, 1], labels=[0, 1]))

'''
{'mse': 0.0}
'''

# 增量计算
mse.add(preds=[0, 1], labels=[0, 0])
mse.add(preds=[0, 1], labels=[1, 1])
print(mse.evaluate())

'''
{'mse': 0.5}
'''

5.6 KiB Raw Blame History Unescape Escape