#quantile #algorithm #approximate #stream #size #statistics #zhang-wang

zw-fast-quantile

张王快速分位数算法在Rust中实现

7个版本

0.2.3 2021年12月30日
0.2.2 2021年12月27日
0.1.2 2021年11月16日

算法 中排名第920

Download history 2/week @ 2024-03-16 27/week @ 2024-03-30 4/week @ 2024-04-06 4/week @ 2024-04-13 6/week @ 2024-04-20 4/week @ 2024-05-11 21/week @ 2024-05-25 17/week @ 2024-06-01 34/week @ 2024-06-08 16/week @ 2024-06-15 6/week @ 2024-06-22

每月下载量61
用于 quantogram

Apache-2.0

27KB
569

zw-fast-quantile

CI Crates.io

Zhang Wang快速近似分位数算法在Rust中的实现:[参考链接](http://web.cs.ucla.edu/~weiwang/paper/SSDBM07_2.pdf)。这是原始算法,它启发了Tensorflow Boosted Trees中的技术:Tensorflow Boosted Trees

安装

将以下内容添加到您的 Cargo.toml

[dependencies]
zw-fast-quantile = "0.2"

然后就可以使用了。如果您使用的是Rust 2015,您还需要在crate根目录中将 extern crate zw-fast-quantile 添加到您的crate中。

示例

该库中有两种实现:FixedSizeEpsilonSummary 是在已知流大小的条件下使用的,而 UnboundEpsilonSummary 是用于未知大小的流。您可以调整自己的错误率 epsilon 以在空间和准确性之间进行权衡。

let epsilon = 0.1;
let n = 10;
let mut s = FixedSizeEpsilonSummary::new(n, epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);
let epsilon = 0.1;
let n = 10;
let mut s = UnboundEpsilonSummary::new(epsilon);
for i in 1..=n {
    s.update(i);
}

let ans = s.query(0.0);
let expected = 1;
assert!(expected == ans);

基准测试

我们对GK01在quantiles中的实现进行了基准测试。为了测试UPDATE,我们以0.01的错误率按顺序插入5000个值。GK01比ZW慢约2.6倍。

zw unbound quantile update
                        time:   [60.780 us 60.855 us 60.936 us]
                        change: [-1.4032% -0.9510% -0.5005%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe
gk quantile update      time:   [156.84 us 157.02 us 157.24 us]
                        change: [-0.1907% -0.0503% +0.0969%] (p = 0.50 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

至于查询时间,我们查询了从0.0到1.0的11个分位数。GK01比ZW慢约1.5倍。

zw unbound quantile query
                        time:   [229.62 ns 230.16 ns 230.77 ns]
                        change: [+1.3422% +1.8105% +2.2504%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
gk quantile query       time:   [350.21 ns 350.48 ns 350.76 ns]
                        change: [-0.4638% -0.3109% -0.1670%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild
  5 (5.00%) high severe

无运行时依赖