4个稳定版本

2.0.1	2023年7月21日
2.0.0	2023年7月20日
1.0.1	2023年7月17日
1.0.0	2023年7月11日

#311 in 内存管理

MIT许可证

85KB
1.5K SLoC

Talloc

注意：该项目已更名为Talc。新的crate在此处。此版本不再维护，我愿意转让crate的所有权（通过我GitHub个人资料上的电子邮件地址联系我）。

Talloc是一个高性能且灵活的no_std-兼容内存分配器，适用于操作系统内核等项目，或用于正常单线程应用的区域分配。

将Talloc用作简单的区域分配器非常容易，但也方便了no_std环境中的实际问题，例如自定义OOM处理，以及扩展和动态减少分配区域等强大功能。

用法

将其作为全局分配器使用，如下所示

use talloc::*;

#[global_allocator]
static ALLOCATOR: Tallock = Talloc::new().spin_lock();
static mut ARENA: [u8; 1000] = [0; 1000];

fn main() {
    // initialize it later...
    unsafe { ALLOCATOR.0.lock().init(ARENA.as_mut_slice().into()); }
}

通过Allocator API将其用作区域分配器，如下所示

use talloc::*;

fn main () {
    let mut arena = Box::leak(vec![0u8; 10000].into_boxed_slice());
    
    let tallock = Talloc::new().spin_lock();
    unsafe { tallock.0.lock().init(arena.into()); }

    let allocator = tallock.allocator_api_ref();
    
    allocator.allocate(..);
}

性能

最坏情况下的分配为O(n)。实际上，它通常比其他分配器快得多。请参见下面的基准测试。

释放总是O(1)，重新分配通常为O(1)，除非原地分配失败。

内存开销

每个分配通常有1个usize的开销，通常。块大小至少为3个3 * usize，因此微小分配将有大量开销。

这比Galloc（另一个边界标记分配器）有所改进，Galloc的最小块大小为4个4 * usize。

基准测试

galloc的基准测试

原始基准测试已被修改（例如，用fastrand替换rand），以便减轻开销。

随机操作基准测试结果

Random Actions Benchmark Results

Talloc优于其他选择。

堆耗尽基准测试结果

Heap Exhaustion Benchmark Results

如果时间惩罚设置正确，Talloc会略微落后。

注意

没有试图将这些时间考虑在内，然而，在我的计算机上结果相当一致。
对齐要求呈指数递减，范围从2^2字节到2^18字节，其中2^2和2^3最为常见。

分配器微基准测试（基于simple_chunk_allocator的基准测试）

注意：预失败分配包括直到第一次分配失败的所有分配，在此点之前堆压力已成为一个主要因素。一些分配器在处理堆压力方面比其他分配器做得更好，而且许多应用程序不关心这种情况下（分配失败导致恐慌），因此它们被单独分开考虑。

RESULTS OF BENCHMARK: Chunk Allocator
   25714 allocation attempts,   25535 successful allocations,   22479 pre-fail allocations,   18067 deallocations
            CATEGORY | OCTILE 0     1     2     3     4     5     6      7        8 | AVERAGE
---------------------|--------------------------------------------------------------|---------
     All Allocations |       63  1176  1659  2058  2457  2940  4179  37569 18199587 |  240918  ticks
Pre-Fail Allocations |       84  1134  1596  1953  2289  2688  3234   8400  1562757 |   15753  ticks
       Deallocations |       42   147   231   315   420   504   588    672     1932 |     420  ticks

RESULTS OF BENCHMARK: Linked List Allocator
  136818 allocation attempts,  106743 successful allocations,   26004 pre-fail allocations,   96369 deallocations
            CATEGORY | OCTILE 0     1     2      3      4      5      6      7       8 | AVERAGE
---------------------|-----------------------------------------------------------------|---------
     All Allocations |       42  4032  9261  15729  23772  33999  46977  59010  621642 |   29190  ticks
Pre-Fail Allocations |       42   840  2121   3780   5964   8694  12243  17514  614250 |   10781  ticks
       Deallocations |       42  3045  6615  10878  15813  21672  28602  38031  107877 |   18880  ticks

RESULTS OF BENCHMARK: Galloc
  282102 allocation attempts,  206544 successful allocations,   22751 pre-fail allocations,  196616 deallocations
            CATEGORY | OCTILE 0   1   2    3      4      5      6      7       8 | AVERAGE
---------------------|-----------------------------------------------------------|---------
     All Allocations |       42  63  63  378  12474  27027  41559  45549  100527 |   19129  ticks
Pre-Fail Allocations |       42  42  42   42     63     63     63    861   21714 |     691  ticks
       Deallocations |       42  63  84   84    105    231    294    693   15771 |     262  ticks

RESULTS OF BENCHMARK: Talloc
 2193976 allocation attempts, 1545626 successful allocations,   24585 pre-fail allocations, 1534743 deallocations
            CATEGORY | OCTILE 0   1   2    3    4    5    6    7      8 | AVERAGE
---------------------|--------------------------------------------------|---------
     All Allocations |       42  63  63   84   84  105  126  210  38871 |     133  ticks
Pre-Fail Allocations |       42  63  63   84   84   84  105  126   3927 |     100  ticks
       Deallocations |       42  84  84  105  105  147  252  336  17115 |     187  ticks

Talloc表现最佳，只有在不承受堆压力时，Galloc才接近其性能。Galloc通常比Talloc分配得略快，但其他方面花费的时间更长，而Talloc的性能则更加稳定。（Galloc使用专门的小桶来覆盖更小的分配范围，而Talloc的小桶化使更广泛的分配快速）。

注意

没有试图将这些时间考虑在内，然而，在我的计算机上结果相当一致。
由于随机分配大小，预失败分配的数量多于信号。
对齐要求呈指数递减，范围从2^2字节到2^18字节，其中2^2和2^3最为常见。

算法

这是一个针对通用用例的dlmalloc风格实现，具有边界标记和桶分配。

与Galloc相比，Talloc完全不按对齐进行桶分配，假设大多数分配最多需要机器字大小对齐，因此预期Galloc在制作大量小型、大型对齐分配时将更快。相反，使用更广泛的桶大小范围，这通常会更加高效。

此外，重新排列了块元数据布局，以允许更小的最小大小块，从而减少小型分配的内存开销。

测试

测试覆盖了大多数辅助类型，并对分配进行了一些合理性检查。

除此之外，对分配器进行了大量的模糊测试。请参阅/fuzz/fuzz_targets/fuzz_arena.rs

功能

spin（默认）：提供了实现GlobalAlloc的Tallock类型（一个自旋互斥锁包装器）。
allocator（默认）：通过Tallock提供了Allocator特质的实现。

一般用法

以下是方法列表

构造函数
- new
- with_oom_handler
信息
- get_arena - 返回当前竞技场内存区域
- get_allocatable_span - 返回当前可能发生分配的内存区域
- get_allocated_span - 返回包含所有已分配内存的最小跨度
管理
- mov - 安全地将初始化的Talloc移动到指定的目的地
- init - 初始化或重新初始化竞技场（如果有的话，忘记所有之前的分配）
- extend - 初始化或扩展竞技场区域
- truncate - 减少竞技场区域的范围
- spin_lock - 将Talloc包装在Tallock中，它支持GlobalAlloc和Allocator API
分配
- malloc
- free
- grow
- shrink

有关更多信息，请参阅它们的文档。

Span是一种方便的小类型，用于描述内存区域，因为尝试操作Range<*mut u8>或*mut [u8]或base_ptr-size对往往不方便或令人烦恼。请参阅Span::from*和span.to_*函数进行转换。

高级用法

不要使用Talloc::new，而是使用Talloc::with_oom_handler并传入一个函数指针。当发生内存不足时，将调用此函数。这可能有多个用途，但其中之一是按需动态扩展内存区域。

#![feature(allocator_api)]
use talloc::*;
use core::alloc::Layout;

fn oom_handler(talloc: &mut Talloc, layout: Layout) -> Result<(), AllocError> {
    // alloc doesn't have enough memory, and we just got called! we must free up some memory
    // we'll go through an example of how to handle this situation

    // we can inspect `layout` to estimate how much we should free up for this allocation
    // or we can extend by any amount (increasing powers of two has good time complexity)

    // this function will be repeatly called until we free up enough memory or 
    // we return Err(AllocError) at which point the allocation will too.
    // be careful to avoid conditions where the arena isn't sufficiently extended
    // indefinitely, causing an infinite loop

    // some limit for the sake of example
    const ARENA_TOP_LIMIT: usize = 0x80000000;

    let old_arena: Span = talloc.get_arena();

    // we're going to extend the arena upward, doubling its size
    // but we'll be sure not to extend past the limit
    let new_arena: Span = old_arena.extend(0, old_arena.size()).below(ARENA_TOP_LIMIT);

    if new_arena == old_arena {
        // we won't be extending the arena, so we should return AllocError
        return Err(AllocError);
    }

    unsafe {
        // we're assuming the new memory up to ARENA_TOP_LIMIT is allocatable
        talloc.extend(new_arena);
    };

    Ok(())
}

注意事项 - 多线程性能

我不知道为什么，但Talloc在极端的多线程竞争下比Galloc受到的影响更大。虽然这对大多数用例来说不是那么相关，但如果可能的话，我希望解决这个问题。如果有人有缓解此问题的建议，请提交一个issue！

我已经尝试将桶数组对齐到单独的缓存行，但结果没有变化。除此之外，我不知道还能尝试什么。

2-Thread Random Actions Benchmark Results

依赖项

约150KB