13 个稳定版本

新功能 1.8.0	2024 年 8 月 12 日
1.7.0	2024 年 7 月 25 日
1.6.0	2024 年 6 月 15 日
1.5.0	2024 年 4 月 7 日
1.0.1	2024 年 2 月 27 日

#534 在数据结构

每月 151 次下载
用于 orx-linked-list

MIT 许可证

225KB
3.5K SLoC

orx-selfref-col

SelfRefCol 是一个核心数据结构，用于方便地构建安全高效的自引用集合，如链表和树。

特性

安全

实现安全自引用集合很棘手。Rust 的安全性特征在这种情况下很保守，并鼓励使用宽智能指针，这会导致缓存局部性差。为了避免这种情况，可以使用索引，但这既不优雅也不安全。

SelfRefCol 在以下方面构建其安全性保证

集合中元素的记忆位置将永远不会改变，除非由于底层 PinnedVec 的固定元素保证而显式更改
将确保所有引用都将位于相同集合的元素中。

换句话说，

通过保持引用有效，
防止引入外部引用，
防止泄漏引用，

使用常规 & 引用构建自引用集合是安全的。

orx-linked-list 包基于 SelfRefCol 定义了单链表和双链表

没有任何指针解引用，
没有任何通过索引的访问，
只需要几个不安全调用。

请注意，std::collections::linked_list::LinkedList 实现包含超过 60 个 unsafe 代码块。

方便

SelfRefCol专门用于构建自引用集合，提供了构建此类集合所需的方法（只有这些方法）。这允许避免使用通用低级类型或方法，这些类型或方法与要定义的结构不相符，例如NonNull，Box::from_raw_in，mem::replace等。相反，实现可以简洁、表达清晰，并且接近方法的伪代码。

以下是一个例子可能会更清晰。考虑以下使用SelfRefCol定义的双向链表的push_back方法，该方法在orx-linked-list中定义。

pub fn push_back(&mut self, value: T) {
    self.col
        .move_mutate(value, |x, value| match x.ends().back() {
            Some(prior_back) => {
                let new_back = x.push_get_ref(value);
                new_back.set_prev(&x, prior_back);
                prior_back.set_next(&x, new_back);
                x.set_ends([x.ends().front(), Some(new_back)]);
            }
            None => {
                let node = x.push_get_ref(value);
                x.set_ends([Some(node), Some(node)]);
            }
        });
}

注意，除了x之外，所有单词都与链表相关，其中x是一个内部可变性键，它在一个非捕获的lambda中使用，以防止引用泄漏。

高效

SelfRefCol的元素在内部存储在一个PinnedVec实现中，这对于引用的正确性和安全性至关重要。此外，这样，集合的元素存储在彼此附近，而不是在内存中的任意位置。与元素通过任意堆分配存储的此类集合相比，这提供了更好的缓存局部性。

根据当前的基准测试，使用SelfRefCol的orx-linked-list实现至少比std::collections::linked_list::LinkedList快三倍。

变体

请注意，这个核心结构能够表示广泛的自引用集合，其中变体通过表达式的特质类型定义方便地定义。

所表示的集合具有以下特点

关系通过常规的&引用表示，避免了使用如Box、Rc、Arc等智能指针的必要性。

集合确保这些引用仅在集合的元素之间设置。换句话说，不允许外部引用，或者集合元素的引用不能泄漏。这构建了安全保证。集合的元素在内部存储在一个PinnedVec实现中，这对于引用的正确性至关重要。此外，这样，集合的元素存储在彼此附近，而不是在内存中的任意位置。与元素通过任意堆分配存储的此类集合相比，这提供了更好的缓存局部性。

集合由以下泛型参数定义 T：存储在集合中的元素类型。 V：定义集合结构的Variant的类型，以下为 V::Storage：定义如何存储T的元素 NodeDataLazyClose：元素作为Option<T>存储，允许懒节点关闭或元素删除 NodeDataEagerClose：元素直接作为T存储 V::Prev：定义如何存储对前一个元素的引用 NodeRefNone：没有元素的前一个引用。 NodeRefSingle：存在一个或没有元素的前一个引用，存储为 Option<&Node>。 NodeRefsArray：存在多个可能的前一个引用，最多可达常数 N，存储为 [Option<&Node>; N]。 NodeRefsVec：存在多个可能的前一个引用，存储为 Vec<&Node>。 V::Next：定义了如何存储对下一个元素的引用类似地，可以用 NodeRefNone、NodeRefSingle、NodeRefsArray 或 NodeRefsVec 中的任何一个表示。 V::Ends：定义了如何存储对集合末尾的引用类似地，可以用 NodeRefNone、NodeRefSingle、NodeRefsArray 或 NodeRefsVec 中的任何一个表示。 V::MemoryReclaim：定义了如何回收已关闭节点的内存 MemoryReclaimNever 永远不会回收已关闭的节点。 MemoryReclaimOnThreshold<D> 当关闭节点的比例超过 2^D 之一时，将回收关闭节点的内存。示例考虑以下四个结构体，它们实现了 Variant 以定义四种不同的自引用集合。请注意，这些定义既表达性强又简洁，从而导致了高效的实现。 use orx_selfref_col::*; #[derive(Clone, Copy)] struct SinglyListVariant; impl<'a, T: 'a> Variant<'a, T> for SinglyListVariant { type Storage = NodeDataLazyClose<T>; // lazy close type MemoryReclaim = MemoryReclaimOnThreshold<2>; // closed nodes will be reclaimed when utilization drops below 75% type Prev = NodeRefNone; // previous nodes are not stored type Next = NodeRefSingle<'a, Self, T>; // there is only one next node, if any type Ends = NodeRefSingle<'a, Self, T>; // there is only one end, namely the front of the list } #[derive(Clone, Copy)] struct DoublyListVariant; impl<'a, T: 'a> Variant<'a, T> for DoublyListVariant { type Storage = NodeDataLazyClose<T>; // lazy close type MemoryReclaim = MemoryReclaimOnThreshold<3>; // closed nodes will be reclaimed when utilization drops below 87.5% type Prev = NodeRefSingle<'a, Self, T>; // there is only one previous node, if any type Next = NodeRefSingle<'a, Self, T>; // there is only one next node, if any type Ends = NodeRefsArray<'a, 2, Self, T>; // there are two ends, namely the front and back of the list } #[derive(Clone, Copy)] struct BinaryTreeVariant; impl<'a, T: 'a> Variant<'a, T> for BinaryTreeVariant { type Storage = NodeDataLazyClose<T>; // lazy close type MemoryReclaim = MemoryReclaimOnThreshold<1>; // closed nodes will be reclaimed when utilization drops below 50% type Prev = NodeRefSingle<'a, Self, T>; // there is only one previous node, namely parent node, if any type Next = NodeRefsArray<'a, 2, Self, T>; // there are 0, 1 or 2 next or children nodes type Ends = NodeRefSingle<'a, Self, T>; // there is only one end, namely the root of the tree } #[derive(Clone, Copy)] struct DynamicTreeVariant; impl<'a, T: 'a> Variant<'a, T> for DynamicTreeVariant { type Storage = NodeDataLazyClose<T>; // lazy close type MemoryReclaim = MemoryReclaimNever; // closed nodes will be left as holes type Prev = NodeRefSingle<'a, Self, T>; // there is only one previous node, namely parent node, if any type Next = NodeRefsVec<'a, Self, T>; // there might be any number of next nodes, namely children nodes type Ends = NodeRefSingle<'a, Self, T>; // there is only one end, namely the root of the tree } NodeIndex NodeIndex 属于高效和安全的交集。 NodeIndex 是一个包含对集合中元素引用的结构体。它提供对元素的常数时间访问。此外，节点索引可以独立于集合存储在别处。从这种意义上说，NodeIndex 对 SelfRefCol 的引用类似于 usize 对标准 Vec 的引用。然而，它特别强调安全性和正确性。以下无效的用法在 NodeIndex 和 SelfRefCol 中不可能发生。不能在错误的 SelfRefCol 上使用 NodeIndex 这个问题可以用 usize 观察，但不能用 NodeIndex 观察。 use orx_selfref_col::*; let mut col1 = SelfRefCol::<Var, _>::new(); let a = col1.mutate_take('a', |x, a| x.push_get_ref(a).index(&x)); let col2 = SelfRefCol::<Var, _>::new(); assert!(!a.is_valid_for_collection(&col2)); // we cannot dereference 'a' with 'col2'! 在对应的元素被删除后不能使用 NodeIndex 这个问题可以用 usize 观察，但不能用 NodeIndex 观察。 let mut col = SelfRefCol::<Var, _>::new(); let [a, b, c, d, e, f, g] = col .mutate_take(['a', 'b', 'c', 'd', 'e', 'f', 'g'], |x, values| { values.map(|val| x.push_get_ref(val).index(&x)) }); let removed_b = col.mutate_take(b, |x, b| b.as_ref(&x).close_node_take_data(&x)); // does not trigger reclaim yet assert!(!b.is_valid_for_collection(&col)); // we cannot dereference 'b' as it is removed. 在元素重组后不能使用 NodeIndex 这是特定于具有懒节点关闭的 SelfRefCol 的，仅在 MemoryReclaimOnThreshold 策略使用时发生。对于 MemoryReclaimNever 策略，这种情况不会发生。一旦节点利用率下降到一定的值，集合将重新组织其元素并回收已关闭节点的内存。这使先前创建的所有索引无效，从而防止任何错误的访问。这类似于在索引指向第3个元素时移除向量的第一个元素。移除后，向量的元素被重新组织，而我们的索引现在指向了一个错误的元素。《NodeIndex》不允许这种访问。 let mut col = SelfRefCol::<Var, _>::new(); let [a, b, c] = col.mutate_take(['a', 'b', 'c'], |x, values| { values.map(|val| x.push_get_ref(val).index(&x)) }); let removed_b = col.mutate_take(b, |x, b| b.as_ref(&x).close_node_take_data(&x)); // triggers reclaim, elements reorganized assert!(!a.is_valid_for_collection(&col)); // all prior indices are invalidated! assert!(!b.is_valid_for_collection(&col)); assert!(!c.is_valid_for_collection(&col)); 使用《SelfRefCol》的Crates 以下Crates使用《SelfRefCol》方便地构建相应的数据结构 orx-linked-list：实现了单链表和双链表。许可证此库采用MIT许可证。有关详细信息，请参阅LICENSE。

依赖关系 ~400KB orx-fixed-vec orx-pinned-vec orx-pseudo-default orx-split-vec dev float-cmp 0.9 dev test-case

orx-selfref-col

13 个稳定版本

orx-selfref-col

特性

安全

方便

高效

变体

示例

`NodeIndex`

不能在错误的 `SelfRefCol` 上使用 `NodeIndex`

在对应的元素被删除后不能使用 `NodeIndex`

在元素重组后不能使用 `NodeIndex`

使用《SelfRefCol》的Crates

许可证

依赖关系