#dictionary #relative #compressor #factor #text #static #input

rlz

相对 Lempel-Ziv (RLZ):一个基于 LZ 且对抗大型静态字典的压缩器

2 个不稳定版本

0.2.0 2022年9月12日
0.1.0 2022年9月12日

#324 in 压缩

MIT 许可协议

58KB
1.5K SLoC

基于相对 Lempel-Ziv (RLZ) 的 LZ 压缩器,该压缩器针对大型静态字典进行压缩。

此代码实现了 RLZ 压缩器,如以下所述

什么是 RLZ(摘自论文)

@article{DBLP:journals/pvldb/HoobinPZ11,
  author    = {Christopher Hoobin and
               Simon J. Puglisi and
               Justin Zobel},
  title     = {Relative Lempel-Ziv Factorization for Efficient Storage 
                and Retrieval of Web Collections},
  journal   = {Proc. {VLDB} Endow.},
  volume    = {5},
  number    = {3},
  pages     = {265--273},
  year      = {2011},
}
@article{DBLP:journals/corr/PetriMNW16,
  author    = {Matthias Petri and
               Alistair Moffat and
               P. C. Nagesh and
               Anthony Wirth},
  title     = {Access Time Tradeoffs in Archive Compression},
  journal   = {CoRR},
  volume    = {abs/1602.08829},
  year      = {2016},
  url       = {http://arxiv.org/abs/1602.08829},
  eprinttype = {arXiv},
}
@inproceedings{DBLP:conf/www/LiaoPMW16,
  author    = {Kewen Liao and
               Matthias Petri and
               Alistair Moffat and
               Anthony Wirth},
  title     = {Effective Construction of Relative Lempel-Ziv 
               Dictionaries},
  booktitle = {Proceedings of {WWW}},
  pages     = {807--816},
  publisher = {{ACM}},
  year      = {2016},
}

RLZ 是什么(摘自论文)

相对 Lempel-Ziv (RLZ) 方案是几种基于短语压缩机制的混合。编码基于固定文本字典,字典中的所有子串都可用于作为 LZ77 风格的因子。但字典以半静态方式构建,因此如果压缩效率不降低,则字典需要代表整个要编码的文本。此外,由于 RLZ 设计用于在构建字典时的大型基于网络的存档,因此不可能在内存中有整个输入文本。

用法

use rlz::RlzCompressor;
use rlz::Dictionary;

let dict = Dictionary::from(&b"banana"[..]);
let rlz_compressor = RlzCompressor::builder().build_from_dict(dict);

let mut output = Vec::new();

let text = b"banana$aba";
let encoded_len = rlz_compressor.encode(&text[..],&mut output).unwrap();
assert_eq!(encoded_len,output.len());

let mut stored_decoder = Vec::new();
rlz_compressor.store(&mut stored_decoder).unwrap();
let loaded_decoder = RlzCompressor::load(&stored_decoder[..]).unwrap();

let mut recovered = Vec::new();
loaded_decoder.decode(&output[..],&mut recovered).unwrap();
assert_eq!(recovered,text);

许可协议

MIT

依赖项

~5–11MB
~111K SLoC