2 个不稳定版本
0.2.0 | 2022年9月12日 |
---|---|
0.1.0 | 2022年9月12日 |
#324 in 压缩
58KB
1.5K SLoC
基于相对 Lempel-Ziv (RLZ) 的 LZ 压缩器,该压缩器针对大型静态字典进行压缩。
此代码实现了 RLZ 压缩器,如以下所述
什么是 RLZ(摘自论文)
@article{DBLP:journals/pvldb/HoobinPZ11,
author = {Christopher Hoobin and
Simon J. Puglisi and
Justin Zobel},
title = {Relative Lempel-Ziv Factorization for Efficient Storage
and Retrieval of Web Collections},
journal = {Proc. {VLDB} Endow.},
volume = {5},
number = {3},
pages = {265--273},
year = {2011},
}
@article{DBLP:journals/corr/PetriMNW16,
author = {Matthias Petri and
Alistair Moffat and
P. C. Nagesh and
Anthony Wirth},
title = {Access Time Tradeoffs in Archive Compression},
journal = {CoRR},
volume = {abs/1602.08829},
year = {2016},
url = {http://arxiv.org/abs/1602.08829},
eprinttype = {arXiv},
}
@inproceedings{DBLP:conf/www/LiaoPMW16,
author = {Kewen Liao and
Matthias Petri and
Alistair Moffat and
Anthony Wirth},
title = {Effective Construction of Relative Lempel-Ziv
Dictionaries},
booktitle = {Proceedings of {WWW}},
pages = {807--816},
publisher = {{ACM}},
year = {2016},
}
RLZ 是什么(摘自论文)
相对 Lempel-Ziv (RLZ) 方案是几种基于短语压缩机制的混合。编码基于固定文本字典,字典中的所有子串都可用于作为 LZ77 风格的因子。但字典以半静态方式构建,因此如果压缩效率不降低,则字典需要代表整个要编码的文本。此外,由于 RLZ 设计用于在构建字典时的大型基于网络的存档,因此不可能在内存中有整个输入文本。
用法
use rlz::RlzCompressor;
use rlz::Dictionary;
let dict = Dictionary::from(&b"banana"[..]);
let rlz_compressor = RlzCompressor::builder().build_from_dict(dict);
let mut output = Vec::new();
let text = b"banana$aba";
let encoded_len = rlz_compressor.encode(&text[..],&mut output).unwrap();
assert_eq!(encoded_len,output.len());
let mut stored_decoder = Vec::new();
rlz_compressor.store(&mut stored_decoder).unwrap();
let loaded_decoder = RlzCompressor::load(&stored_decoder[..]).unwrap();
let mut recovered = Vec::new();
loaded_decoder.decode(&output[..],&mut recovered).unwrap();
assert_eq!(recovered,text);
许可协议
MIT
依赖项
~5–11MB
~111K SLoC