3个版本
使用旧的Rust 2015
0.1.2 | 2019年7月17日 |
---|---|
0.1.1 | 2016年5月6日 |
0.1.0 | 2016年5月6日 |
在文本处理类别中排名第1191
每月下载量23次
在2个crate中使用(通过zoea)
125KB
668 行
Porter词干提取器
在Rust中实现Porter词干提取算法。它操作的是语素簇而不是字符,因此输入流可以包含混合内容。
示例
use unicode_segmentation::UnicodeSegmentation;
use porter_stemmer::stem;
let original = "Almost forty years later, these fair information practices have become the standard for privacy protection around the world. And yet, over that same time period, we have seen an exponential growth in the use of surveillance technologies, and our daily interactions are now routinely captured, recorded, and manipulated by small and large institutions alike.";
let tokenised_sentence = original.clone().unicode_words();
println!("Original:\n{}", original);
println!("Stemmed:\n{}", tokenised_sentence.map(stem).fold(String::new(), |last, next| { format!("{}{} ", last, next)}));
Original:
Almost forty years later, these fair information practices have become
the standard for privacy protection around the world. And yet, over
that same time period, we have seen an exponential growth in the use
of surveillance technologies, and our daily interactions are now
routinely captured, recorded, and manipulated by small and large institutions
alike.
Stemmed:
Almost forti year later these fair inform practic have becom the standard for
privaci protect around the world And yet over that same time period we have
seen an exponenti growth in the us of surveil technologi and our daili interact
ar now routin captur record and manipul by small and larg institut alik
以下文本摘自《身份之旅的教训》,仅作为示例使用 - 许可证:https://creativecommons.org/licenses/by-nc-nd/2.5/ca/
许可证
MPL-2.0
依赖项
~555KB