4个版本

0.1.3	2024年5月27日
0.1.2	2024年4月29日
0.1.1	2024年4月26日
0.1.0	2024年4月23日

#375 在机器学习

每月33次下载

MIT 许可证

1MB
638 代码行

bleuscore

bleuscore 是一个用 Rust 编写的快速 BLEU 分数计算器。

安装

Python 包已经发布到 pypi，因此我们可以通过多种方式直接安装它

pip
```
pip install bleuscore
```
poetry
```
poetry add bleuscore
```
uv
```
uv pip install bleuscore
```

快速入门

用法与 huggingface evaluate 完全相同

- import evaluate
+ import bleuscore

predictions = ["hello there general kenobi", "foo bar foobar"]
references = [
    ["hello there general kenobi", "hello there !"],
    ["foo bar foobar"]
]

- bleu = evaluate.load("bleu")
- results = bleu.compute(predictions=predictions, references=references)
+ results = bleuscore.compute(predictions=predictions, references=references)

print(results)
# {'bleu': 1.0, 'precisions': [1.0, 1.0, 1.0, 1.0], 'brevity_penalty': 1.0, 
# 'length_ratio': 1.1666666666666667, 'translation_length': 7, 'reference_length': 6}

基准测试

TLDR：当语料库大小超过100K时，我们获得了10倍以上的速度提升

Benchmark

我们使用快速入门中显示的演示数据进行此简单基准测试。您可以在 benchmark/simple 中查看基准测试源代码。

rs_bleuscore：bleuscore Python 库
local_hf_bleu：本地 local 中的 huggingface evaluate bleu 算法
sacre_bleu：sacrebleu
- 注意，在简单演示数据中，我们得到了与 sacrebleu 不同的结果，其余所有结果都相同
hf_evaluate：使用 evaluate 包的 huggingface evaluate bleu 算法

N 用于通过简单地复制演示数据来增加预测/引用的大小。我们可以看到，随着 N 增加，bleuscore 的性能变得更好。您可以导航到 benchmark 以获取更多基准测试详细信息。

N=100

hyhyperfine --warmup 5 --runs 10   \
  "python simple/rs_bleuscore.py 100" \
  "python simple/local_hf_bleu.py 100" \
  "python simple/sacre_bleu.py 100"   \
  "python simple/hf_evaluate.py 100"

Benchmark 1: python simple/rs_bleuscore.py 100
  Time (mean ± σ):      19.0 ms ±   2.6 ms    [User: 17.8 ms, System: 5.3 ms]
  Range (min … max):    14.8 ms …  23.2 ms    10 runs

Benchmark 2: python simple/local_hf_bleu.py 100
  Time (mean ± σ):      21.5 ms ±   2.2 ms    [User: 19.0 ms, System: 2.5 ms]
  Range (min … max):    16.8 ms …  24.1 ms    10 runs

Benchmark 3: python simple/sacre_bleu.py 100
  Time (mean ± σ):      45.9 ms ±   2.2 ms    [User: 38.7 ms, System: 7.1 ms]
  Range (min … max):    43.5 ms …  50.9 ms    10 runs

Benchmark 4: python simple/hf_evaluate.py 100
  Time (mean ± σ):      4.504 s ±  0.429 s    [User: 0.762 s, System: 0.823 s]
  Range (min … max):    4.163 s …  5.446 s    10 runs

Summary
  python simple/rs_bleuscore.py 100 ran
    1.13 ± 0.20 times faster than python simple/local_hf_bleu.py 100
    2.42 ± 0.35 times faster than python simple/sacre_bleu.py 100
  237.68 ± 39.88 times faster than python simple/hf_evaluate.py 100

N = 1K ~ 1M

命令	平均 [ms]	最小 [ms]	最大 [ms]	相对
`python simple/rs_bleuscore.py1000`	20.3 ± 1.3	18.2	21.4	1.00
`python simple/local_hf_bleu.py1000`	45.8 ± 1.2	44.2	47.5	2.26 ± 0.16
`python simple/rs_bleuscore.py10000`	37.8 ± 1.5	35.9	39.5	1.87 ± 0.14
`python simple/local_hf_bleu.py10000`	295.0 ± 5.9	288.6	304.2	14.55 ± 0.98
`python simple/rs_bleuscore.py100000`	219.6 ± 3.3	215.3	224.0	10.83 ± 0.72
`python simple/local_hf_bleu.py100000`	2781.4 ± 42.2	2723.1	2833.0	137.13 ± 9.10
`python simple/rs_bleuscore.py1000000`	2048.8 ± 31.4	2013.2	2090.3	101.01 ± 6.71
`python simple/local_hf_bleu.py1000000`	28285.3 ± 100.9	28182.1	28396.1	1394.51 ± 90.21

依赖关系

~8–15MB
~188K SLoC