#poisson #hypothesis #statistics #probability

poisson-rate-test

根据 Gu 2008 年的《两种泊松率比率测试》测试泊松过程率的等价性

9 个稳定版本

1.2.2 2021 年 5 月 17 日
1.2.0 2021 年 5 月 14 日
1.0.4 2021 年 4 月 22 日
0.1.0 2021 年 4 月 21 日

#318 in 科学

38 每月下载量
用于 kda-tools

MIT 许可证

40KB
421

包含 (debian 包, 6KB) poisson-rate-test_1.2.2_amd64.deb

poisson-rate-test

目的

一个 Rust 库,提供比较泊松数据率和进行关于该数据的假设检验的方法。

具体来说,截至 1.0 版本,提供了两种类型的测试:速率到速率比较(2 个事件)和比率到比率比较(4 个事件)。

速率到速率

此测试假设给定数据集中事件 A 和事件 B 的数量具有形式 r_a / r_b >= R 的速率,对于常数 R 与两个事件以相同速率发生的零假设进行对比。

示例:测试事件速率与假设

use poisson_ratio_test::two_tailed_rates_equal;
//make some data that sure looks like it occurs with rate = 0.5;
let data = vec![0,1,1,0]; //note, 0,2,0,0 would be the same (2/4).
let n1 = data.len() as f64;
let sum1 = data.iter().sum::<usize>() as f64;
//are these rates equal to my hypothesized rate of 0.5?
let expected_n = n1;
let expected_sum = 0.5 * n1;
let p = two_tailed_rates_equal(sum1, n1, expected_sum, expected_n);
assert!(p>0.99); //<--confidently yes

示例:比较新条件下的事件速率

use claim::{assert_lt,assert_gt};
use poisson_ratio_test::{one_tailed_ratio,two_tailed_rates_equal};
//say we made a change, and observed the new rates 
let occurances_observed = vec![0,0,1,0];
//and here's the "usual" data
let occurances_usual = vec![1,1,5,3,3,8];
//need the basic n/sum statistics
let n1 = occurances_observed.len() as f64;
let n2 = occurances_usual.len() as f64;
let sum1 = occurances_observed.iter().sum::<usize>() as f64;
let sum2 = occurances_usual.iter().sum::<usize>() as f64;
//is rate of observed > rate usual?
let p = one_tailed_ratio(sum1, n1, sum2, n2, 1.0);
assert_lt!(p,0.01); //<--confidently no

//Maybe just check both tails to be sure (this tests r observed / r baseline != 1)
let p = two_tailed_rates_equal(sum1, n1, sum2, n2);
assert_lt!(p,0.01); //<--confidently no

示例:更多的数据有助于

这里有一个长示例,更多请参阅 文档

use claim::{assert_lt,assert_gt};
use poisson_ratio_test::{one_tailed_ratio,two_tailed_rates_equal};

//create data where rate1 == 1/2 * rate2
let occurances_one = vec![1,0,1,0,1,0];
let occurances_two = vec![1,1,1,1,0,2];
let n1 = occurances_one.len() as f64;
let n2 = occurances_two.len() as f64;
let sum1 = occurances_one.iter().sum::<usize>() as f64;
let sum2 = occurances_two.iter().sum::<usize>() as f64;

//test hypothesis that r1/r2 > 1/2
let p = one_tailed_ratio(sum1, n1, sum2, n2, 0.5);
assert_eq!(p, 0.50); //<-- nope
//let's test the neighbordhood around that
let p = one_tailed_ratio(sum1, n1, sum2, n2, 0.49999 );
assert_gt!(p, 0.49); //<-- still nope

//Two sided test. What is the likelihood of seeing the data we got
//given that r1/r2 == 1/2?
let p_half = one_tailed_ratio(sum1, n1, sum2, n2, 0.49999);
//other side
let p_double = one_tailed_ratio(sum2, n2, sum1, n1, 2.0001);
//just about 1.0!
assert_gt!(2.0*p_half.min(p_double),0.99);

//we *know* they are not equal, but can we prove it in general?
let mut p_double = two_tailed_rates_equal(sum2, n2, sum1, n1);
//note: p_double is in [.15,.25]
assert_lt!(p_double,0.25);//<--looking  unlikely... maybe more data is required
assert_gt!(p_double,0.15);//<--looking  unlikely... maybe more data is required

//get more of the same data
let trial2_one = vec![1,0,1,0,1,0,1,0,1,0,1,0,1,0];
let trial2_two = vec![1,1,1,1,0,2,0,2,1,1,0,2,1,1];
let t2n1 = trial2_one.len() as f64;
let t2n2 = trial2_two.len() as f64;
let t2sum1 = trial2_one.iter().sum::<usize>() as f64;
let t2sum2 = trial2_two.iter().sum::<usize>() as f64;
p_double = two_tailed_rates_equal(t2sum2, t2n2, t2sum1, t2n1);
assert_lt!(p_double,0.05);//<--That did the trick

比较事件比率

假设有两个事件 a 和 b。我们有两个组(基线和治疗)。我们在治疗中做了些改变,想知道这种改变是否影响了 a/b 的比率。因此,我们为基线和治疗都计数 a 和 b。注意 p 值是从模拟中估计的,所以它们在不同运行之间可能略有变化(如 0.01 左右)。传递更高的样本数以稳定,但以 CPU 成本为代价。

示例:比较 Hunt Showdown 中的新武器

kda-tools 中就是这样做的

use poisson_rate_test::bootstrap::param::ratio_events_greater_pval;
use claim::{assert_lt,assert_gt};
//57 matches, 50 kills, 27 deaths without Caldwell Conversion pistol (baseline)
let normal_matches = 57;
let normal_kills = 50;
let normal_deaths = 27;
//10 matches, 4 kills, 9 deaths with Caldell Conversion pistol (treatment)
let cc_matches=10;
let cc_kills=4;
let cc_deaths=9;

let p_cc_treatment_greater= bootstrap::param::ratio_events_greater_pval(
    normal_kills,normal_deaths, normal_matches,
    cc_kills,cc_deaths, cc_matches,
).unwrap() ;
assert_gt!(p_cc_treatment_greater,0.90); //Hell no that's not greater (cc_kills/cc_deaths) is much less than normal_kills/normal_deaths
let p_cc_treatment_less = bootstrap::param::ratio_events_greater_pval(
    cc_kills,cc_deaths, cc_matches,
    normal_kills,normal_deaths, normal_matches,
).unwrap() ;  
assert_lt!(p_cc_treatment_less,0.05); //very high significance / very low p-value
use poisson_rate_test::boostrap::param::ratio_events_equal_pval_n;
use claim::{assert_lt,assert_gt};
let base_a = vec![0,0,1,0];
let base_b = vec![1,0,1,1];
let treat_a = vec![1,1,1,2];
let treat_b = vec![1,1,1,1];
//Did treatment increase ratio of a/b?
let p = bootstrap::param::ratio_events_equal_pval_n(
    base_a.iter().sum::<usize>(),
    base_b.iter().sum::<usize>(),
    base_a.len() as usize,
    treat_a.iter().sum::<usize>(),
    treat_b.iter().sum::<usize>(),
    treat_a.len() as usize,
    10000
);
assert_lt!(p.unwrap(),0.15); //<--tentatively yes
assert_gt!(p.unwrap(),0.05);

//just need more data, right?
let base_a = vec![0,0,1,0, 1,0,0,0];
let base_b = vec![1,0,1,1, 0,1,1,1];
let treat_a = vec![1,1,1,2, 1,2,1,1];
let treat_b = vec![1,1,1,1, 1,1,1,1];
//Did treatment increase ratio of a/b?
let p = bootstrap::param::ratio_events_equal_pval_n(
    base_a.iter().sum::<usize>(),
    base_b.iter().sum::<usize>(),
    base_a.len() as usize,
    treat_a.iter().sum::<usize>(),
    treat_b.iter().sum::<usize>(),
    treat_a.len() as usize,
    10000
);
assert_lt!(p.unwrap(),0.05); //<--confidently yes 
assert_gt!(p.unwrap(),0.01);

比率到比率

此测试假设两个数据集中的两个事件以不同的比率发生 r1_a/r2_b >= r2_a/r2_b,与它们相等的零假设进行对比。

为什么

在游戏中一个有趣的统计量是事件的比率(如各种载具的击杀/死亡),或带和不带物品的击杀/比赛速率。

我在 kda-tools 中用它来进行 Hunt Showdown 中的载具假设检验。

依赖关系

~6MB
~116K SLoC