22个版本

0.2.0-alpha1	2024年5月1日
0.1.16	2024年3月9日
0.1.15	2024年2月10日
0.1.14	2023年12月19日
0.1.9	2023年6月29日

#50 in 数据结构

5,869 monthly downloads
用于 15 个crate (4个直接使用)

MIT/Apache

535KB
7K SLoC

range-set-blaze

快速、排序的整数集合，具有完整的集合操作

整数可以是任何大小（从 u8 到 u128）并且可以是符号的（从 i8 到 i128）。集合操作包括 union、intersection、difference、symmetric difference 和 complement。

该crate的主要结构是 RangeSetBlaze，一个整数集合。有关详细信息，请参阅文档。

与标准BTreeSet和HashSet不同，RangeSetBlaze不存储集合中的每个整数。相反，它以缓存高效的BTreeMap存储排序且不相交的整数范围。与其他已知区间库不同，它通过提供完整的集合操作并针对密集整数集合进行优化而有所不同。

我们可以从未排序且冗余的整数（或范围）构建一个RangeSetBlaze。当输入是密集的时，构建将是输入数量的线性函数，并且集合操作将加速到平方。

该crate的主要特性是SortedDisjoint。它由整数排序且不相交的范围的迭代器实现。有关详细信息，请参阅SortedDisjoint文档。

使用任何SortedDisjoint迭代器，我们可以在一次遍历范围内执行集合运算，并使用最少的（常量）内存。它在编译时强制执行“排序且不相交”的约束。这个特性受到来自sorted_iter crate的SortedIterator特性的启发。SortedDisjoint与它的灵感不同，它专注于不相交的整数范围。

该crate支持no_std、WASM和嵌入式项目。使用以下命令：

cargo add range-set-blaze --features "alloc" --no-default-features

基准测试

请参阅基准测试，与其他相关crate的性能比较。

通常，对于涉及密集整数和范围的许多任务，RangeSetBlaze比替代方案要快得多。

基准测试在benches目录中。要运行它们，请使用cargo bench。

文章

在《Towards Data Science》中的《Rust中创建快速、安全和兼容的数据结构的九条规则：从RangeSetBlaze中学到的经验》提供了对crate及其设计的概述。
在《Towards Data Science》中的《在Web和嵌入式环境中运行Rust的九条规则：从将range-set-blaze移植到no_std和WASM中学到的实用经验》涵盖了移植到“no_std”的内容。
检查AI生成的代码是否完美且自动：将Kani的形式验证应用于ChatGPT建议的Rust代码的经验。展示了如何证明溢出安全性。
使用Dafny正式验证Rust算法的九条规则在《Towards Data Science》中展示了如何正式验证crate中的一个算法。
九条规则用于加速你的Rust代码的SIMD：从range-set-blaze crate中提升数据摄取7倍的一般经验
另请参阅： 变更日志

示例

示例1

在这里，我们将两个RangeSetBlaze的并集（运算符“|”）

Example 1

use range_set_blaze::RangeSetBlaze;

 // a is the set of integers from 100 to 499 (inclusive) and 501 to 1000 (inclusive)
let a = RangeSetBlaze::from_iter([100..=499, 501..=999]);
 // b is the set of integers -20 and the range 400 to 599 (inclusive)
let b = RangeSetBlaze::from_iter([-20..=-20, 400..=599]);
// c is the union of a and b, namely -20 and 100 to 999 (inclusive)
let c = a | b;
assert_eq!(c, RangeSetBlaze::from_iter([-20..=-20, 100..=999]));

示例2

在生物学中，假设我们想要找到一个基因的内含子区域，但我们只有转录区域和外显子区域。

Example 2

我们为转录区域创建一个RangeSetBlaze，为所有外显子区域创建一个RangeSetBlaze。然后，我们取转录区域和外显子区域之间的差集以找到内含子区域。

use range_set_blaze::RangeSetBlaze;

let line = "chr15   29370   37380   29370,32358,36715   30817,32561,37380";

// split the line on white space
let mut iter = line.split_whitespace();
let chr = iter.next().unwrap();

// Parse the start and end of the transcription region into a RangeSetBlaze
let trans_start: i32 = iter.next().unwrap().parse().unwrap();
let trans_end: i32 = iter.next().unwrap().parse().unwrap();
let trans = RangeSetBlaze::from_iter([trans_start..=trans_end]);
assert_eq!(trans, RangeSetBlaze::from_iter([29370..=37380]));

// Parse the start and end of the exons into a RangeSetBlaze
let exon_starts = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ends = iter.next().unwrap().split(',').map(|s| s.parse::<i32>());
let exon_ranges = exon_starts
    .zip(exon_ends)
    .map(|(s, e)| s.unwrap()..=e.unwrap());
let exons = RangeSetBlaze::from_iter(exon_ranges);
assert_eq!(
    exons,
    RangeSetBlaze::from_iter([29370..=30817, 32358..=32561, 36715..=37380])
);

// Use 'set difference' to find the introns
let intron = trans - exons;
assert_eq!(intron, RangeSetBlaze::from_iter([30818..=32357, 32562..=36714]));
for range in intron.ranges() {
    let (start, end) = range.into_inner();
    println!("{chr}\t{start}\t{end}");
}

依赖项

~1.2–1.7MB
~39K SLoC

构建 build.rs
开发 criterion 0.5+html_reports not wasm32
开发 glob not wasm32
开发 quickcheck 1.0 not wasm32
开发 quickcheck_macros 1.0 not wasm32
开发 rand 0.8.4 not wasm32
开发 range-collections 0.4 not wasm32
开发 range-set 0.0.10 not wasm32
开发 rangemap not wasm32
开发 roaring not wasm32
开发 thousands not wasm32
开发 trybuild not wasm32
开发 wasm-bindgen-test wasm32