14 个版本

0.1.14	2021 年 9 月 19 日
0.1.13	2021 年 9 月 19 日

741 在数据结构中

自定义许可

14KB
196 行

一个不相交集合数据结构（即 Union-Find w/ Rank）

什么是 Union-Find？

假设你有一个由元素 e1，e2，...，en 组成的集合 S，并希望使用操作将它们分组到不同的集合中：

“将 ei 和 ej 放入同一个组”（并集），
“给出属于组 ei 的代表”（查找）。

那么 Union-Find 数据结构可以帮助非常有效地存储底层组，并实现这个 API。

注意：实现的变体使用了路径压缩来进一步提高性能。

(一些) 应用

检测图中的环：给定一个图 G，我们可以将边的端点放入同一个组（相同的连通分量），除非存在一对共享组代表的端点 (ei, ej)。如果发生这种情况，它们之间已经存在一条路径，添加此边将添加多条路径，这不符合无环图的情况。
图中连通分量的数量：给定一个图 G，将边的端点放入同一个组（相同的连通分量）。一旦所有节点耗尽，形成的组数就是 G 中的连通分量数量。

有关 Union-Find 的一些有趣的讲义。

用法

设置

在 Cargo.toml 中，将此软件包作为依赖项添加。

[dependencies]
reunion = { version = "0.1" }

API

示例 1

任务：创建一个包含 usize 元素的任意大小的 UnionFind 数据结构。然后，合并一些元素，并捕获合并后的数据结构状态。

解决方案: use reunion::{UnionFind, UnionFindTrait}; use std::collections::HashSet; fn main() { // Create a UnionFind data structure of arbitrary size that contains subsets of usizes. let mut uf1 = UnionFind::<usize>::new(); println!("Initial state: {}", &uf); println!("All elements form their own group (singletons)."); println!(format!("{:?}", uf.subsets()); uf.union(2, 1); println!("After combining the groups that contains 2 and 1: {}", &uf); uf.union(4, 3); println!("After combining the groups that contains 4 and 3: {}", &uf); uf.union(6, 5); println!("After combining the groups that contains 6 and 5: {}", &uf); let mut hs1 = HashSet::new(); hs1.insert(1); hs1.insert(2); let mut hs2 = HashSet::new(); hs2.insert(3); hs2.insert(4); let mut hs3 = HashSet::new(); hs3.insert(5); hs3.insert(6); let mut subsets = uf.subsets(); assert_eq!(subsets.len(), 3); assert!(&subsets.contains(&hs1)); assert!(&subsets.contains(&hs2)); assert!(&subsets.contains(&hs3)); uf.union(1, 5); println!("After combining the groups that contains 1 and 5: {}", &uf); subsets = uf.subsets(); assert_eq!(subsets.len(), 2); hs3.extend(&hs1); assert!(&subsets.contains(&hs3)); assert!(&subsets.contains(&hs2)); let mut uf_clone = uf.clone(); uf_clone.find(2); assert_eq!(&uf, &uf_clone); println!("{}", &uf); // It is possible to iterate over the subsets. for partition in uf1 { println!("{:?}", partition); } } 示例 2 任务：创建一个大小至少为 10 的 UnionFind 数据结构，其中包含 u16 元素。注意：大小问题仅有助于减少所需的内存分配数量。因此，它并不是特别重要，完全是可选的。解决方案: // Create a UnionFind data structure of a fixed size that contains subsets of u16. let mut uf2 = UnionFind::<u16>::with_capacity(10); println!("{}", uf2); 性能基准测试 DIY 在你的机器上执行基准测试克隆此仓库。运行 cargo bench 你应该会看到类似这样的输出 #Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013497126s, Time per operation: 285ns #Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.013323348s, Time per operation: 285ns #Find: 2497150, #Union: 1048575, #Total: 3545725, Time: 1.012333206s, Time per operation: 285ns ... Big Merge (20, 10000) time: [1.0175 s 1.0190 s 1.0205 s] change: [-0.4773% -0.2721% -0.0647%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 10 (10.00%) high mild 3 (3.00%) high severe ... 摘要在AMD Ryzen 9 3900X 12核心处理器（同时运行大量其他进程的情况下），使用大小为2 ** 20的UnionFind，总共3,545,725次操作大约需要1秒，这是预期的，因为这些操作的时间复杂度实际上是O(1)（实际上它是O(alpha(n))，其中alpha(n)是逆阿克曼函数，但它的增长非常缓慢，我们可以将其视为常数）。

依赖 ~0.4–1MB ~24K SLoC serde+derive dev criterion 0.3+html_reports dev rand 0.8