4 个版本 (2 个破坏性更新)
0.3.0 | 2021年2月10日 |
---|---|
0.2.0 | 2021年2月8日 |
0.1.1 | 2021年2月7日 |
0.1.0 | 2021年2月7日 |
#763 in 数据结构
73KB
600 行
目录
Membuffer
A rust library for rapid deserialization of huge datasets with few keys. The library is meant to be used with mmaped files, almost any crate on crates.io which does serialization and deserialization needs to process the whole structure. This makes it unusable with large memory mapped files. For this purpose this library only scans the header to get the schema of the datastructure and leaves all other fields untouched unless it is specifically asked to fetch them.
警告:此库使用内存变换和指针算术来提高性能并防止不必要的解析。尽管代码经过大量测试,但在考虑是否需要使用此库时必须谨慎。不要在Little和Big Endian系统之间交换使用此库创建的缓冲区,它将无法工作!
基准测试
为什么这个库这么快?基准测试包括使用不同有效负载大小的数据结构进行反序列化,分别是1 MB、10 MB或100 MB。membuffer只加载数据结构布局,并将字符串的切片返回给数据结构,而不是解析整个结构。这有助于使用MMAPed结构等。从基准测试中可以看出,membuffer的速度只取决于键的数量,而不是反序列化的数据结构的大小,这是反序列化复杂度不依赖于数据结构大小的良好证明。
使用案例
- 处理具有大字段的结构,因为它会懒惰地解析
- 将具有少量键的大型数据集保存到磁盘上
- 对于MMAPed数据结构,因为字段只有在请求时才读取,因此不会引起页面错误
不使用的原因
- 当读取序列化数据结构中的所有字段时,此crate将比serde_json略快,但安全性不高
- 当使用大量键并且读取大多数键时,此库在序列化和反序列化方面几乎不会带来任何好处
示例
use membuffer::{MemBufferWriter,MemBufferReader,MemBufferError};
fn main() {
let mut writer = MemBufferWriter::new();
//Add a vector with some numbers
let some_bytes : Vec<u64> = vec![100,200,100,200,1,2,3,4,5,6,7,8,9,10];
//Write the entry into the memory buffer, this is immutable after writing no changing after this
writer.add_entry(&some_bytes[..]);
//Create a Vec<u8> out of all the data
let result = writer.finalize();
//Read the data back in again
let reader = MemBufferReader::new(&result).unwrap();
//Tell the function the type to enable type checking will return err if the type is not right
assert_eq!(reader.load_entry::<&[u64]>(0).unwrap(), vec![100,200,100,200,1,2,3,4,5,6,7,8,9,10]);
}
在数据结构中使用serde的示例
use membuffer::{MemBufferWriter,MemBufferReader,MemBufferError};
use serde::{Serialize,Deserialize};
#[derive(Serialize,Deserialize)]
struct HeavyStruct {
vec: Vec<u64>,
name: String,
frequency: i32,
id: i32,
}
fn main() {
//Create a serde structure
let value = HeavyStruct {
vec: vec![100,20,1],
name: String::from("membuffer!"),
frequency: 10,
id: 200,
};
//Write the data into the memory buffer
let mut writer = MemBufferWriter::new();
writer.add_serde_entry(&value);
//Create an Vec<u8> out of the data
let result = writer.finalize();
//Load the entry again
let reader = MemBufferReader::new(&result).unwrap();
//Specify the type for serde to do the type checking, internally the serde object is saved as JSON string representation
let struc: HeavyStruct = reader.load_serde_entry(0).unwrap();
assert_eq!(struc.vec, vec![100,20,1]);
assert_eq!(struc.name,"membuffer!");
assert_eq!(struc.frequency,10);
assert_eq!(struc.id,200);
}
基准测试代码
//Nighlty only feature! Run on the nightly version
#![feature(test)]
use test::Bencher;
use membuffer::{MemBufferWriter,MemBufferReader};
use serde::{Serialize,Deserialize};
use serde_json;
#[bench]
fn benchmark_few_keys_payload_1mb_times_3(b: &mut Bencher) {
let mut huge_string = String::with_capacity(1_000_000);
for _ in 0..1_000_000 {
huge_string.push('a');
}
let mut writer = MemBufferWriter::new();
writer.add_entry(&huge_string);
writer.add_entry(&huge_string);
writer.add_entry(&huge_string);
let result = writer.finalize();
assert!(result.len() > 3_000_000);
b.iter(|| {
let reader = MemBufferReader::new(&result).unwrap();
let string1 = reader.load_entry::<&str>(0).unwrap();
let string2 = reader.load_entry::<&str>(1).unwrap();
let string3 = reader.load_entry::<&str>(2).unwrap();
assert_eq!(string1.len(), 1_000_000);
assert_eq!(string2.len(), 1_000_000);
assert_eq!(string3.len(), 1_000_000);
});
}
#[derive(Serialize,Deserialize)]
struct BenchSerde<'a> {
one: &'a str,
two: &'a str,
three: &'a str
}
#[bench]
fn benchmark_few_keys_payload_1mb_times_3_serde(b: &mut Bencher) {
let mut huge_string = String::with_capacity(1_000_000);
for _ in 0..1_000_000 {
huge_string.push('a');
}
let first = BenchSerde {
one: &huge_string,
two: &huge_string,
three: &huge_string
};
let string = serde_json::to_string(&first).unwrap();
b.iter(|| {
let reader: BenchSerde = serde_json::from_str(&string).unwrap();
assert_eq!(reader.one.len(), 1_000_000);
assert_eq!(reader.two.len(), 1_000_000);
assert_eq!(reader.three.len(), 1_000_000);
});
}
依赖关系
~0.7–1.5MB
~34K SLoC