4 个版本

0.3.1	~~2021年2月12日~~
0.3.0	~~2021年2月12日~~
0.2.2	2021年2月8日
0.2.1	2021年1月14日
0.1.2	2021年1月8日

1732 在文本处理中

每月下载 43 次
在 vextractor-cli 中使用

AGPL-3.0

16KB
141 行

文本提取器

文本提取器 是一个用于提取文本文件词汇的简单库。

关于

文本提取器 支持所有 Unicode 支持的语言，只要语言使用 Unicode 空格 ' ' (U+20) 来分隔单词。

快速示例

extern crate vextractor;
use vextractor::vex::Vextract;
let x = Vextract::new(
    "somepath/somefile.txt", // file containing the text to be processed
    vec!["EU", "etc.", "i.e.", "e.g."], // Acronyms
    vec!["Germany", "France", "Belgium", "Italy"] // Proper Nouns
);
println!("{}", x.get_pretty_vocab()); // Prints the vocabulary
println!("{}", x.get_sorted_pretty_vocab()); // Sorts, then prints
x.write_to_file("somepath/somefile.txt"); // Writes vocab to a text file

许可

文本提取器 受 GNU AFFERO 通用公共许可证版本 3 激活。请阅读 LICENSE.md 文件以获取更多信息。

4 个版本

文本提取器

关于

快速示例

许可

无运行时依赖