4个版本

新 0.2.0	2024年8月18日
0.1.2	2023年2月13日
0.1.1	2023年2月13日
0.1.0	2023年2月13日

#117 在数据结构

每月121次下载

MIT/Apache

1MB
10K SLoC

BLART

BLART是一个自适应基数树的实现，用作映射和集合数据结构的底层。自适应基数树在处理分解为字节字符串的键时，提供了极大的空间效率和性能。

示例

以下是一个使用TreeMap类型（直接从标准库中借用）的示例。

use blart::TreeMap;

// type inference lets us omit an explicit type signature (which
// would be `TreeMap<&str, &str>` in this example).
let mut movie_reviews: TreeMap<_, _> = TreeMap::new();

// review some movies.
let _ = movie_reviews.try_insert("Office Space",       "Deals with real issues in the workplace.").unwrap();
let _ = movie_reviews.try_insert("Pulp Fiction",       "Masterpiece.").unwrap();
let _ = movie_reviews.try_insert("The Godfather",      "Very enjoyable.").unwrap();
let _ = movie_reviews.try_insert("The Blues Brothers", "Eye lyked it a lot.").unwrap();

// check for a specific one.
if !movie_reviews.contains_key("Les Misérables") {
    println!("We've got {} reviews, but Les Misérables ain't one.",
             movie_reviews.len());
}

// oops, this review has a lot of spelling mistakes, let's delete it.
movie_reviews.remove("The Blues Brothers");

// look up the values associated with some keys.
let to_find = ["Up!", "Office Space"];
for movie in &to_find {
    match movie_reviews.get(movie) {
       Some(review) => println!("{movie}: {review}"),
       None => println!("{movie} is unreviewed.")
    }
}

// Look up the value for a key (will panic if the key is not found).
println!("Movie review: {}", movie_reviews["Office Space"]);

// iterate over everything.
for (movie, review) in &movie_reviews {
    println!("{movie}: \"{review}\"");
}

文档

main分支的文档

测试

Miri

目前我们正在使用一些特定的crate（sptr，将来将回到core::ptr::*）来确保我们与严格来源兼容。以下MIRIFLAGS设置应该可以启用检查以确保我们兼容。

MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-symbolic-alignment-check" cargo +nightly miri test

我认为这很有用，因为我们在标记指针实现中进行了某些指针操作，修改指针的内容以存储数据位。

模糊测试

要运行模糊测试器，我使用以下命令

cargo +nightly fuzz run -j 8 -s address fuzz_raw_api -- -max_len=32768 -max_total_time=3600 && cargo +nightly fuzz cmin fuzz_raw_api

这将运行模糊测试器总共3600秒（1小时），使用8个工作任务（我的开发机器上总核心数的一半），并使用地址 sanitizer。使用cmin命令在生成新条目后压缩语料库。

覆盖率

要从模糊测试语料库生成覆盖率报告

# replace with own triple as required
TARGET_TRIPLE="x86_64-unknown-linux-gnu"
cargo +nightly fuzz coverage fuzz_raw_api && cargo cov -- show fuzz/target/"$TARGET_TRIPLE"/release/fuzz_raw_api \
    --format=html \
    -instr-profile=fuzz/coverage/fuzz_raw_api/coverage.profdata \
    > index.html

基准测试

要运行基准测试，请安装cargo-criterion，然后运行

cargo +nightly criterion --history-id "$(git rev-parse --short HEAD)-0" --features bench-perf-events

或

cargo bench --bench <bench_name> --features bench-perf-events

如果您收到“权限被拒绝”错误，请更新perf_event_paranoid

sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

有关详细信息，请参阅以下链接。

性能分析

我使用了一个相对现实基准：计数文本文件中的单词。要开始，下载一个文本文件，例如

curl -o data/Ulysses.txt https://www.gutenberg.org/cache/epub/4300/pg4300.txt

然后使用profiling配置构建单词计数示例

cargo build --profile profiling --exampleps

然后在下载的数据上运行单词计数工作负载，同时进行性能分析

samply record ./target/profiling/examples/count_words blart data/book-chapters-combined.txt

许可证

许可证采用以下之一

Apache许可证2.0版本（LICENSE-APACHE 或 https://apache.ac.cn/licenses/LICENSE-2.0）
MIT许可证（LICENSE-MIT 或 https://open-source.org.cn/licenses/MIT）

由您选择。

贡献

除非您明确声明，否则根据Apache-2.0许可证定义的，您有意提交以包含在作品中的任何贡献，将如上所述双重许可，不附加任何额外条款或条件。