#smaz #dictionary

bin+lib tinystring

带有字典生成的字符串压缩工具

1 个不稳定版本

0.1.0 2020年1月12日

#504 in 压缩

MIT 许可证

7KB
63

tiny-string-rs

tiny-string-rs 是一个为 Rust 构建的库,用于使用生成的字典压缩字符串。使用训练数据根据您想要压缩的数据类型来提高性能。

字典槽长度

在生成字典时,您可以指定槽长度作为第二个参数,例如

let dict: Vec<String> = generate_dictionary(contents, 5); // slot length of 5

注意:槽长度越大,生成字典的计算成本越高。如果您选择大于 6 的槽大小,建议您缓存字典以供重复使用。

字典大小

字典大小目前固定为 896。前 128 个 ASCII 字符保留为标准字符集。压缩后的字符串长度可以减少 40-65%。实际的压缩大小(字符串的总字节数)将远小于字符串长度。

示例用法

extern crate testmark;
use testmark::Testmark;
use testmark::Timer;

use tinystring::{ generate_dictionary, tiny_string_deflate, tiny_string_inflate };
use std::fs;

fn main() {
    let mut cbench: Testmark = Timer::new("TinyString Compression Test");

    let contents: String = fs::read_to_string("sample.txt").unwrap();
    let data: String = "I just spent about $3000 surgically removing a big ball of WTF from my Maine coon! Came home with a dozen staples down his belly and immediately started trying to eat the plastic wrap I just pulled off his medication bottles. Moron. I'm sorry your kitty didn't make it. Being stupidly suicidal seems to be a breed characteristic!".to_string();
    let dict: Vec<String> = generate_dictionary(contents, 5);

    cbench.start();
    let result: String = tiny_string_deflate(data, dict.clone());
    let inflated: String = tiny_string_inflate(result.to_string(), dict.clone());
    cbench.stop();

    println!("{} {}", result.to_string(), result.len());
    println!("{} {}", inflated, inflated.len());
    fs::write("result.txt", result).unwrap();

    cbench.print();
}

依赖项