#letter #count #genome #simd #parallelism #frequency #acgt

genome_counter

使用SIMD和并行计算来统计基因组中ACGT字母的频率

1个不稳定版本

0.2.0 2021年7月28日

#1464 in 硬件支持

MIT/Apache

8KB
127

统计基因组碱基

此库以尽可能快的速度统计ACGT字母。

在72个逻辑核心上,我们实现了80.19 GiB/s的吞吐量。

在拥有8个逻辑核心的笔记本电脑上,我们获得了13.766 GiB/s。

用法

use genome_counter;
let results = genome_counter::count_opt(b"ACGT").unwrap();
assert_eq!(results.a, 1);
assert_eq!(results.c, 1);
assert_eq!(results.g, 1);
assert_eq!(results.t, 1);

基准测试

我们使用critereon.rs进行基准测试,在100e6字节上得到了以下结果

Benchmarking count_rand_100000000_opt: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 35.5s or reduce sample count to 20.
count_rand_100000000_opt                                                                             
                        time:   [6.6789 ms 6.7654 ms 6.8687 ms]
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) high mild
  12 (12.00%) high severe

Benchmarking count_rand_100000000_simple: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 614.4s or reduce sample count to 10.
count_rand_100000000_simple                                                                             
                        time:   [127.08 ms 133.08 ms 139.84 ms]
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe

依赖项

~2.5MB
~48K SLoC