1个稳定版本

1.0.0	2020年4月11日

#1000 在文本处理中

Apache-2.0

12KB
214 行

寻找所有公共子串

一种寻找所有公共字符串的方法，对于大型字符串样本尤其快速。它只使用了Rust的std库。

该算法使用一个二维字典树来获取所有片段。垂直的是标准的后缀字典树，但每个后缀中最后一个单词的所有节点都相互链接，我称之为虚拟水平链接。

用法

使用函数get_substrings来获取字符串列表中的所有公共字符串，

示例

use common_substrings::get_substrings;
let input_strings = vec!["java", "javascript", "typescript", "coffeescript", "coffee"];
let result_substrings = get_substrings(input_strings, 2, 3);

给出以下结果列表

Substring(sources: {2, 3}, name: escript, weight: 14)
Substring(sources: {1, 0}, name: java, weight: 8)
Substring(sources: {4, 3}, name: coffee, weight: 12)

参数

input - 目标输入字符串向量。
min_occurrences 捕获的公共子串的最小出现次数。
min_length 捕获的公共子串的最小长度。

算法

说明在此

其他实现

JavaScript

许可

Apache-2.0

1个稳定版本

寻找所有公共子串

用法

示例

参数

算法

其他实现

许可

无运行时依赖