2个版本

0.1.2 2023年3月29日
0.1.1 2023年3月27日

#220 in 压缩

Download history 944/week @ 2024-03-13 939/week @ 2024-03-20 1128/week @ 2024-03-27 674/week @ 2024-04-03 748/week @ 2024-04-10 1241/week @ 2024-04-17 1002/week @ 2024-04-24 688/week @ 2024-05-01 698/week @ 2024-05-08 774/week @ 2024-05-15 950/week @ 2024-05-22 648/week @ 2024-05-29 695/week @ 2024-06-05 642/week @ 2024-06-12 1148/week @ 2024-06-19 1569/week @ 2024-06-26

4,176 每月下载量

MIT/Apache

1MB
672

trie-rs

基于LOUDS的KV功能前缀Trie库。

这是https://laysakura.github.io/trie-rs/trie_rs的分支。

快速入门

要使用trie-rs,请在您的Cargo.toml文件中添加以下内容

[dependencies]
trie-rs = "0.1"  # NOTE: Replace to latest minor version.

使用概述

use std::str;
use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new();  // Inferred `TrieBuilder<u8>` automatically
builder.push("すし", 1);
builder.push("すしや", 2);
builder.push("すしだね", 3);
builder.push("すしづめ", 4);
builder.push("すしめし", 5);
builder.push("すしをにぎる", 6);
builder.push("すし", 7);  // Word `push`ed twice is just ignored.
builder.push("🍣", 8);

let mut trie = builder.build();

// exact_match(): Find a word exactly match to query.
assert_eq!(trie.exact_match("すし"), true);
assert_eq!(trie.exact_match("🍣"), true);
assert_eq!(trie.exact_match("🍜"), false);

// predictive_search(): Find words which include `query` as their prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.predictive_search("すし");
let results_in_str: Vec<&str> = results_in_u8s
    .iter()
    .map(|u8s| str::from_utf8(u8s).unwrap())
    .collect();
assert_eq!(
    results_in_str,
    vec![
        "すし",
        "すしだね",
        "すしづめ",
        "すしめし",
        "すしや",
        "すしをにぎる"
    ]  // Sorted by `Vec<u8>`'s order
);

// common_prefix_search(): Find words which is included in `query`'s prefix.
let results_in_u8s: Vec<Vec<u8>> = trie.common_prefix_search("すしや");
let results_in_str: Vec<&str> = results_in_u8s
    .iter()
    .map(|u8s| str::from_utf8(u8s).unwrap())
    .collect();
assert_eq!(
    results_in_str,
    vec![
        "すし",
        "すしや",
    ]  // Sorted by `Vec<u8>`'s order
);

// common_prefix_search_with_value(): Find words which is included in `query`'s prefix and return their values.
let results_in_u8s: Vec<(Vec<u8>, u8)> = trie.common_prefix_search_with_values("すしや");
let results_in_str: Vec<(&str, u8)> = results_in_u8s
   .iter()
   .map(|(u8s, v)| (str::from_utf8(u8s).unwrap(), *v))
   .collect();

   assert_eq!(
   results_in_str,
   vec![
   ("すし", 1),
   ("すしや", 2),
   ]  // Sorted by `Vec<u8>`'s order
   );

// get_value(): Get value of a word.
assert_eq!(trie.get("すし"), Some(&1));

// set value in a built trie.
trie.set("すし", 9);
assert_eq!(trie.get("すし"), Some(&9));

与各种数据类型一起使用

TrieBuilder使用如下泛型类型实现

impl<Label: Ord + Clone> TrieBuilder<Label> {
    ...
    pub fn push<Arr: AsRef<[Label]>>(&mut self, word: Arr) { ... }
    ...
}

在上面的使用概述示例中,我们使用了Label=u8, Arr=&str

这里显示了其他LabelArr类型示例。

Label=&str,Arr=Vec<&str>

假设Label是英文单词,Arr是英文短语。

use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::new();
builder.push(vec!["a", "woman"], 0 );
builder.push(vec!["a", "woman", "on", "the", "beach"], 1);
builder.push(vec!["a", "woman", "on", "the", "run"], 2);

let trie = builder.build();

assert_eq!(
    trie.exact_match(vec!["a", "woman", "on", "the", "beach"]),
    true
);
assert_eq!(
    trie.predictive_search(vec!["a", "woman", "on"]),
    vec![
        ["a", "woman", "on", "the", "beach"],
        ["a", "woman", "on", "the", "run"],
    ],
);
assert_eq!(
    trie.common_prefix_search(vec!["a", "woman", "on", "the", "beach"]),
    vec![vec!["a", "woman"], vec!["a", "woman", "on", "the", "beach"]],
);

Label=u8,Arr=[u8;n]

假设Label是π(=3.14...)中的数字,而Arr是用于按10个数字分开π的窗口。

use trie_rs::TrieBuilder;

let mut builder = TrieBuilder::<u8, u8>::new(); // Pi = 3.14...

builder.push([1, 4, 1, 5, 9, 2, 6, 5, 3, 5], 1);
builder.push([8, 9, 7, 9, 3, 2, 3, 8, 4, 6], 2);
builder.push([2, 6, 4, 3, 3, 8, 3, 2, 7, 9], 3);
builder.push([6, 9, 3, 9, 9, 3, 7, 5, 1, 0], 4);
builder.push([5, 8, 2, 0, 9, 7, 4, 9, 4, 4], 5);
builder.push([5, 9, 2, 3, 0, 7, 8, 1, 6, 4], 6);
builder.push([0, 6, 2, 8, 6, 2, 0, 8, 9, 9], 7);
builder.push([8, 6, 2, 8, 0, 3, 4, 8, 2, 5], 8);
builder.push([3, 4, 2, 1, 1, 7, 0, 6, 7, 9], 9);
builder.push([8, 2, 1, 4, 8, 0, 8, 6, 5, 1], 10);
builder.push([3, 2, 8, 2, 3, 0, 6, 6, 4, 7], 11);
builder.push([0, 9, 3, 8, 4, 4, 6, 0, 9, 5], 12);
builder.push([5, 0, 5, 8, 2, 2, 3, 1, 7, 2], 13);
builder.push([5, 3, 5, 9, 4, 0, 8, 1, 2, 8], 14);

let trie = builder.build();

assert_eq!(trie.exact_match([5, 3, 5, 9, 4, 0, 8, 1, 2, 8]), true);
assert_eq!(
    trie.predictive_search([3]),
    vec![
        [3, 2, 8, 2, 3, 0, 6, 6, 4, 7],
        [3, 4, 2, 1, 1, 7, 0, 6, 7, 9],
    ],
);
assert_eq!(
    trie.common_prefix_search([1, 4, 1, 5, 9, 2, 6, 5, 3, 5]),
    vec![[1, 4, 1, 5, 9, 2, 6, 5, 3, 5]],
);

特性

  • 泛型类型支持:如上述示例所示,trie-rs不仅可以用于搜索UTF-8字符串,还可以用于其他数据类型。
  • 基于louds-rs,它快速、并行且内存高效。
  • 最新的基准测试结果始终可访问:trie-rs在Travis CI上持续使用Criterion.rs进行基准测试。图形基准测试结果在此发布。

致谢

edict.furigana用于基准测试。此文件按以下步骤构建

  1. EDICT下载edict.gz
  2. 将其从原始EUC转换为UTF-8。
  3. 使用edict-to-csv将其转换为CSV文件。
  4. 提取字段$1用于平假名/片假名单词,字段$3用于其他(如汉字)单词。
  5. 使用kana2hira将片假名转换为平假名。

非常感谢这些字典和工具。

版本

trie-rs使用语义版本化

当前主版本为0,次要版本更新可能会涉及破坏公共API的变更(尽管会尽量避免)。

支持的Rust版本

trie-rs与以下Rust版本在Travis CI中持续进行测试

  • 1.33.0
  • 最新稳定版本

因此,它应该可以与Rust 1.33.0及更高版本兼容。

较旧版本也可能兼容,但未经测试且无法保证。

贡献

欢迎任何形式的pull request。

指南

  • README.md$ cargo readme命令生成。请不要手动更新README.md,而是编辑src/lib.rs,然后执行$ cargo readme > README.md
  • Travis CI会自动执行以下提交和推送到您的pull-requests
    • $ cargo readme > README.md
    • $cargo fmt --all

许可证

MIT或Apache-2.0

依赖项