lasso — Rust 并发库 // Lib.rs • Rust 包仓库

15 个版本

新版本 0.7.3	2024 年 8 月 19 日
0.7.2	2023 年 5 月 15 日
0.7.0	~~2023 年 4 月 2 日~~
0.6.0	2021 年 9 月 1 日
0.1.2	2020 年 3 月 29 日

#20 in 并发

64,886 每月下载量
在 78 个 crates 中使用（直接使用 17 个）

MIT/Apache

320KB
6K SLoC

一个多线程和单线程的字符串池，允许字符串以最小的内存占用进行缓存，并使用唯一键与之关联，以便在任何时候检索。一个 Rodeo 允许 O(1) 池化和解析，并且可以被转换为一个 RodeoReader 以允许无争用的解析，包括键到字符串和字符串到键的操作。它也可以转换为一个 RodeoResolver，仅提供键到字符串操作，以实现最低的内存占用。

我应该使用哪个池？

对于单线程工作负载，建议使用Rodeo，而多线程应用程序应使用ThreadedRodeo。这两种方式都是字符串内联的唯一方法，但大多数应用程序都会达到一个阶段，此时它们已完成字符串内联，而此时将选择RodeoReader和RodeoResolver。如果用户还需要获取字符串的键，则必须使用RodeoReader（尽管他们在此点仍可以将它们转移到RodeoResolver）。对于只需要键到字符串解析的用户，RodeoResolver在最小可能的内存使用下提供无争访问。请注意，要访问ThreadedRodeo，需要启用multi-threaded特性。

内联器	线程安全	内联字符串	str到key	key到str	无争访问	内存使用
`Rodeo`	❌	✅	✅	✅	N/A	中等
`ThreadedRodeo`	✅	✅	✅	✅	❌	大多数
`RodeoReader`	✅	❌	✅	✅	✅	中等
`RodeoResolver`	✅	❌	❌	✅	✅	最少

Cargo特性

默认情况下，lasso有一个依赖项，hashbrown，并且仅公开Rodeo。由于标准库的hashmap中raw_entry api目前不稳定，因此使用了Hashbrown。原始hashmap API用于hashmap中的自定义哈希，这可以显著减少内存使用。为了使用ThreadedRodeo，您必须启用multi-threaded特性。

multi-threaded - 启用ThreadedRodeo，多线程任务的内联器
ahasher - 使用ahash的RandomState作为默认哈希函数
no-std - 为Rodeo和ThreadedRodeo启用no_std + alloc支持
- 自动启用以下所需特性
  - ahasher - no_std哈希函数
serialize - 为所有Spur类型和所有内联器实现Serialize和Deserialize
inline-more - 使用#[inline]注解外部api

示例：使用Rodeo

use lasso::Rodeo;

let mut rodeo = Rodeo::default();
let key = rodeo.get_or_intern("Hello, world!");

// Easily retrieve the value of a key and find the key for values
assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!(Some(key), rodeo.get("Hello, world!"));

// Interning the same string again will yield the same key
let key2 = rodeo.get_or_intern("Hello, world!");

assert_eq!(key, key2);

示例：使用ThreadedRodeo

use lasso::ThreadedRodeo;
use std::{thread, sync::Arc};

let rodeo = Arc::new(ThreadedRodeo::default());
let key = rodeo.get_or_intern("Hello, world!");

// Easily retrieve the value of a key and find the key for values
assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!(Some(key), rodeo.get("Hello, world!"));

// Interning the same string again will yield the same key
let key2 = rodeo.get_or_intern("Hello, world!");

assert_eq!(key, key2);

// ThreadedRodeo can be shared across threads
let moved = Arc::clone(&rodeo);
let hello = thread::spawn(move || {
    assert_eq!("Hello, world!", moved.resolve(&key));
    moved.get_or_intern("Hello from the thread!")
})
.join()
.unwrap();

assert_eq!("Hello, world!", rodeo.resolve(&key));
assert_eq!("Hello from the thread!", rodeo.resolve(&hello));

示例：创建RodeoReader

use lasso::Rodeo;

// Rodeo and ThreadedRodeo are interchangeable here
let mut rodeo = Rodeo::default();

let key = rodeo.get_or_intern("Hello, world!");
assert_eq!("Hello, world!", rodeo.resolve(&key));

let reader = rodeo.into_reader();

// Reader keeps all the strings from the parent
assert_eq!("Hello, world!", reader.resolve(&key));
assert_eq!(Some(key), reader.get("Hello, world!"));

// The Reader can now be shared across threads, no matter what kind of Rodeo created it

示例：创建RodeoResolver

use lasso::Rodeo;

// Rodeo and ThreadedRodeo are interchangeable here
let mut rodeo = Rodeo::default();

let key = rodeo.get_or_intern("Hello, world!");
assert_eq!("Hello, world!", rodeo.resolve(&key));

let resolver = rodeo.into_resolver();

// Resolver keeps all the strings from the parent
assert_eq!("Hello, world!", resolver.resolve(&key));

// The Resolver can now be shared across threads, no matter what kind of Rodeo created it

示例：创建自定义范围的键

有时您希望键只存在于（或不存在）一定范围的值中，以便您可以拥有自定义的niche。这允许您将更多数据打包到原本未使用的空间中，这对于内存敏感型应用程序可能至关重要。

use lasso::{Key, Rodeo};

// First make our key type, this will be what we use as handles into our interner
#[derive(Copy, Clone, PartialEq, Eq)]
struct NicheKey(u32);

// This will reserve the upper 255 values for us to use as niches
const NICHE: usize = 0xFF000000;

// Implementing `Key` is unsafe and requires that anything given to `try_from_usize` must produce the
// same `usize` when `into_usize` is later called
unsafe impl Key for NicheKey {
    fn into_usize(self) -> usize {
        self.0 as usize
    }

    fn try_from_usize(int: usize) -> Option<Self> {
        if int < NICHE {
            // The value isn't in our niche range, so we're good to go
            Some(Self(int as u32))
        } else {
            // The value interferes with our niche, so we return `None`
            None
        }
    }
}

// To make sure we're upholding `Key`'s safety contract, let's make two small tests
#[test]
fn value_in_range() {
    let key = NicheKey::try_from_usize(0).unwrap();
    assert_eq!(key.into_usize(), 0);

    let key = NicheKey::try_from_usize(NICHE - 1).unwrap();
    assert_eq!(key.into_usize(), NICHE - 1);
}

#[test]
fn value_out_of_range() {
    let key = NicheKey::try_from_usize(NICHE);
    assert!(key.is_none());

    let key = NicheKey::try_from_usize(u32::max_value() as usize);
    assert!(key.is_none());
}

// And now we're done and can make `Rodeo`s or `ThreadedRodeo`s that use our custom key!
let mut rodeo: Rodeo<NicheKey> = Rodeo::new();
let key = rodeo.get_or_intern("It works!");
assert_eq!(rodeo.resolve(&key), "It works!");

示例：使用`FromIterator`创建

use lasso::Rodeo;
use core::iter::FromIterator;

// Works for both `Rodeo` and `ThreadedRodeo`
let rodeo = Rodeo::from_iter(vec![
    "one string",
    "two string",
    "red string",
    "blue string",
]);

assert!(rodeo.contains("one string"));
assert!(rodeo.contains("two string"));
assert!(rodeo.contains("red string"));
assert!(rodeo.contains("blue string"));

use lasso::Rodeo;
use core::iter::FromIterator;

// Works for both `Rodeo` and `ThreadedRodeo`
let rodeo: Rodeo = vec!["one string", "two string", "red string", "blue string"]
    .into_iter()
    .collect();

assert!(rodeo.contains("one string"));
assert!(rodeo.contains("two string"));
assert!(rodeo.contains("red string"));
assert!(rodeo.contains("blue string"));

基准测试

基准测试使用Criterion.rs收集
操作系统：Windows 10
CPU: Ryzen 9 3900X，主频 3800MHz
RAM: 3200MHz
Rustc: 稳定版 1.44.1

Rodeo

STD 随机状态

方法	时间	吞吐量
`解析`	1.9251 μs	13.285 GiB/s
`尝试解析`	1.9214 μs	13.311 GiB/s
`未经检查解析`	1.4356 μs	17.816 GiB/s
`get_or_intern` (空)	60.350 μs	433.96 MiB/s
`get_or_intern` (填充)	57.415 μs	456.15 MiB/s
`try_get_or_intern` (空)	58.978 μs	444.06 MiB/s
`try_get_or_intern(填充)`	57.421 μs	456.10 MiB/s
`get` (空)	37.288 μs	702.37 MiB/s
`get` (填充)	55.095 μs	475.36 MiB/s

AHash

方法	时间	吞吐量
`尝试解析`	1.9282 μs	13.264 GiB/s
`解析`	1.9404 μs	13.181 GiB/s
`未经检查解析`	1.4328 μs	17.851 GiB/s
`get_or_intern` (空)	38.029 μs	688.68 MiB/s
`get_or_intern` (填充)	33.650 μs	778.30 MiB/s
`try_get_or_intern` (空)	39.392 μs	664.84 MiB/s
`try_get_or_intern(填充)`	33.435 μs	783.31 MiB/s
`get` (空)	12.565 μs	2.0356 GiB/s
`get` (填充)	26.545 μs	986.61 MiB/s

FXHash

方法	时间	吞吐量
`解析`	1.9014 μs	13.451 GiB/s
`尝试解析`	1.9278 μs	13.267 GiB/s
`未经检查解析`	1.4449 μs	17.701 GiB/s
`get_or_intern` (空)	32.523 μs	805.27 MiB/s
`get_or_intern` (填充)	30.281 μs	864.88 MiB/s
`try_get_or_intern` (空)	31.630 μs	828.00 MiB/s
`try_get_or_intern(填充)`	31.002 μs	844.78 MiB/s
`get` (空)	12.699 μs	2.0141 GiB/s
`get` (填充)	29.220 μs	896.28 MiB/s

ThreadedRodeo

STD 随机状态

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	54.336 μs	482.00 MiB/s	364.27 μs	71.897 MiB/s
`尝试解析`	54.582 μs	479.82 MiB/s	352.67 μs	74.261 MiB/s
`get_or_intern` (空)	266.03 μs	98.447 MiB/s	N\A	N\A
`get_or_intern` (填充)	103.04 μs	254.17 MiB/s	441.42 μs	59.331 MiB/s
`try_get_or_intern` (空)	261.80 μs	100.04 MiB/s	N\A	N\A
`try_get_or_intern` (填充)	102.61 μs	255.25 MiB/s	447.42 μs	58.535 MiB/s
`get` (空)	80.346 μs	325.96 MiB/s	N\A	N\A
`get` (填充)	92.669 μs	282.62 MiB/s	439.24 μs	59.626 MiB/s

AHash

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	22.261 μs	1.1489 GiB/s	265.46 μs	98.658 MiB/s
`尝试解析`	22.378 μs	1.1429 GiB/s	268.58 μs	97.513 MiB/s
`get_or_intern` (空)	157.86 μs	165.91 MiB/s	N\A	N\A
`get_or_intern` (填充)	56.320 μs	465.02 MiB/s	357.13 μs	73.335 MiB/s
`try_get_or_intern` (空)	161.46 μs	162.21 MiB/s	N\A	N\A
`try_get_or_intern` (填充)	55.874 μs	468.73 MiB/s	360.25 μs	72.698 MiB/s
`get` (空)	43.520 μs	601.79 MiB/s	N\A	N\A
`get` (填充)	53.720 μs	487.52 MiB/s	360.66 μs	72.616 MiB/s

FXHash

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`尝试解析`	17.289 μs	1.4794 GiB/s	238.29 μs	109.91 MiB/s
`解析`	19.833 μs	1.2896 GiB/s	237.05 μs	110.48 MiB/s
`get_or_intern` (空)	130.97 μs	199.97 MiB/s	N\A	N\A
`get_or_intern` (填充)	42.630 μs	614.35 MiB/s	301.60 μs	86.837 MiB/s
`try_get_or_intern` (空)	129.30 μs	202.55 MiB/s	N\A	N\A
`try_get_or_intern` (填充)	42.508 μs	616.12 MiB/s	337.29 μs	77.648 MiB/s
`get` (空)	28.001 μs	935.30 MiB/s	N\A	N\A
`get` (填充)	37.700 μs	694.68 MiB/s	292.15 μs	89.645 MiB/s

RodeoReader

STD 随机状态

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	1.9398 μs	13.185 GiB/s	4.3153 μs	5.9269 GiB/s
`尝试解析`	1.9315 μs	13.242 GiB/s	4.1956 μs	6.0959 GiB/s
`未经检查解析`	1.4416 μs	17.741 GiB/s	3.1204 μs	8.1964 GiB/s
`get` (空)	38.886 μs	673.50 MiB/s	N\A	N\A
`get` (填充)	56.271 μs	465.42 MiB/s	105.12 μs	249.14 MiB/s

AHash

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	1.9404 μs	13.181 GiB/s	4.1881 微秒	6.1069 GiB/s
`尝试解析`	1.8932 微秒	13.509 GiB/s	4.2410 微秒	6.0306 GiB/s
`未经检查解析`	1.4128 微秒	18.103 GiB/s	3.1691 微秒	8.0703 GiB/s
`get` (空)	11.952 微秒	2.1399 GiB/s	N\A	N\A
`get` (填充)	27.093 微秒	966.65 MiB/s	56.269 微秒	465.44 MiB/s

FXHash

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	1.8987 微秒	13.471 GiB/s	4.2117 微秒	6.0727 GiB/s
`尝试解析`	1.9103 微秒	13.389 GiB/s	4.2254 微秒	6.0529 GiB/s
`未经检查解析`	1.4469 微秒	17.677 GiB/s	3.0923 微秒	8.2709 GiB/s
`get` (空)	12.994 微秒	1.9682 GiB/s	N\A	N\A
`get` (填充)	29.745 微秒	880.49 MiB/s	52.387 微秒	499.93 MiB/s

RodeoResolver

方法	时间（1 线程）	吞吐量（1 线程）	时间（24 线程）	吞吐量（24 线程）
`解析`	1.9416 微秒	13.172 GiB/s	3.9114 微秒	6.5387 GiB/s
`尝试解析`	1.9264 微秒	13.277 GiB/s	3.9289 微秒	6.5097 GiB/s
`未经检查解析`	1.6638 微秒	15.372 GiB/s	3.1741 微秒	8.0578 GiB/s

依赖项

~1.4–7MB
~29K SLoC

15 个版本

我应该使用哪个池？

Cargo特性

示例：使用Rodeo

示例：使用ThreadedRodeo

示例：创建RodeoReader

示例：创建RodeoResolver

示例：创建自定义范围的键

示例：使用FromIterator创建

基准测试

Rodeo

STD 随机状态

AHash

FXHash

ThreadedRodeo

STD 随机状态

AHash

FXHash

RodeoReader

STD 随机状态

AHash

FXHash

RodeoResolver

依赖项

示例：使用`FromIterator`创建