餅乾 — Rust 中的性能分析 // Lib.rs

15 个版本 (6 个重大更改)

0.8.1	2024年5月21日
0.8.0	2024年5月20日
0.7.0	2024年5月16日
0.6.2	2024年5月12日
0.2.3	2024年4月25日

#40 在性能分析

每月141 次下载
用于 serde_json_borrow

MIT 许可

70KB
1.5K SLoC

餅乾

binggan logo

餅乾（餅乾，bǐng gān，意为饼干）是 Rust 的基准测试库。它设计得易于使用，并能够提供代码性能和内存消耗的全面概述。

它允许将任意命名的输入传递给基准测试。

特性

📊 顶峰内存使用
💎 栈偏移随机化
💖 性能集成（Linux）
🔄 Delta 比较
⚡ 快速执行
🧩 交错测试运行（更准确的结果）
🏷️ 命名基准输入
🧙 无宏，无魔法（只是常规 API）
🎨 现在有了彩色输出！
🦀 在稳定 Rust 上运行

示例

use binggan::{black_box, InputGroup, PeakMemAlloc, INSTRUMENTED_SYSTEM};

#[global_allocator]
pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;


fn test_vec(data: &Vec<usize>) {
    // ...
}
fn test_hashmap(data: &Vec<usize>) {
    // ...
}

fn bench_group(mut runner: InputGroup<Vec<usize>>) {
    runner.set_alloc(GLOBAL); // Set the peak mem allocator. This will enable peak memory reporting.

    // Enables the perf integration. Only on Linux, noop on other OS.
    runner.config().enable_perf();
    // Trashes the CPU cache between runs
    runner.config().set_cache_trasher(true);
    // Enables throughput reporting
    runner.throughput(|input| input.len() * std::mem::size_of::<usize>());
    runner.register("vec", |data| {
        black_box(test_vec(data));
    });
    runner.register("hashmap", move |data| {
        black_box(test_hashmap(data));
    });
    runner.run();
}

fn main() {
    // Tuples of name and data for the inputs
    let data = vec![
        (
            "max id 100; 100 ids all the same",
            std::iter::repeat(100).take(100).collect(),
        ),
        ("max id 100; 100 ids all different", (0..100).collect()),
    ];
    bench_group(InputGroup::new_with_inputs(data));
}

示例输出

cargo bench

turbo_buckets_vs_fxhashmap_zipfs1%
100k max id / 100k num elem
TurboBuckets                 Memory: 786.4 KB      Avg: 0.3411ms  (-8.90%)     Median: 0.3394ms  (-9.51%)     0.3223ms    0.3741ms    
Vec                          Memory: 400.0 KB      Avg: 0.0503ms  (-10.27%)    Median: 0.0492ms  (-12.27%)    0.0463ms    0.0676ms    
FxHashMap                    Memory: 442.4 KB      Avg: 1.0560ms  (+26.89%)    Median: 1.1512ms  (+58.61%)    0.6558ms    1.1979ms    
FxHashMap Reserved Max Id    Memory: 1.2 MB        Avg: 0.5220ms  (-7.86%)     Median: 0.4988ms  (-11.40%)    0.4762ms    0.7515ms    
500k max id / 500k num elem
TurboBuckets                 Memory: 4.5 MB      Avg: 1.7766ms  (+24.15%)    Median: 1.6490ms  (+15.67%)    1.3477ms    2.7288ms     
Vec                          Memory: 2.0 MB      Avg: 0.3759ms  (0.75%)      Median: 0.3598ms  (0.50%)      0.2975ms    0.5415ms     
FxHashMap                    Memory: 1.8 MB      Avg: 3.7157ms  (+6.57%)     Median: 3.5566ms  (+2.38%)     3.1622ms    5.2814ms     
FxHashMap Reserved Max Id    Memory: 9.4 MB      Avg: 5.8076ms  (+39.56%)    Median: 5.3666ms  (+31.39%)    3.0705ms    15.8945ms

顶峰内存

要激活顶峰内存报告，您需要将分配器包装在 PeakMemAlloc 中，并在组中调用 set_alloc。

虽然分配数量对于性能分析也很有趣，但顶峰内存将确定代码的内存需求。

待办事项

自定义报告（例如，编写自己的报告器）

也许以后的特性

图表
自动比较直方图（例如，如果基准测试有几个操作带，则比较它们会很好）

依赖项

~0.9–8.5MB
~66K SLoC