14 个稳定版本
新版本 2.2.1 | 2024 年 8 月 11 日 |
---|---|
2.2.0 | 2024 年 4 月 5 日 |
2.1.5 | 2024 年 2 月 28 日 |
2.1.2 | 2023 年 11 月 26 日 |
1.2.0 | 2022 年 7 月 31 日 |
#70 在 文本处理
每月 426 次下载
用于 tantivy-analysis-contrib
450KB
10K SLoC
Rust 音译
这是 Apache commons-codec v1.15 的音译算法的 Rust 版本。
算法
目前,有
- Beider-Morse
- Caverphone 1
- Caverphone 2
- 科隆
- Daitch Mokotoff Soundex
- 匹配评分方法
- Metaphone
- Metaphone (Double)
- NYSIIS
- Phonex
- Soundex
- Soundex (Refined)
请注意,这些算法大多是为拉丁字母设计的,通常是为特定用例(例如,英语姓名/英语词典单词等)设计的。
示例
Beider-Morse
fn main() -> Result<(), rphonetic::PhoneticError> {
use std::path::PathBuf;
use rphonetic::{BeiderMorseBuilder, ConfigFiles, Encoder};
let config_files = ConfigFiles::new(&PathBuf::from("./test_assets/cc-rules/"))?;
let builder = BeiderMorseBuilder::new(&config_files);
let beider_morse = builder.build();
assert_eq!(beider_morse.encode("Van Helsing"),"(Ylznk|ilzn|ilznk|xilzn|xilznk)-(banilznk|bonilznk|fYnYlznk|fYnilznk|fanYlznk|fanilznk|fonYlznk|fonilznk|vYnYlznk|vYnilznk|vanYlznk|vaniilznk|vanilzn|vanilznk|vonYlznk|voniilznk|vonilzn|vonilznk)");
Ok(())
}
Caverphone 1 & 2
fn main() {
use rphonetic::{Caverphone1, Encoder};
let caverphone = Caverphone1;
assert_eq!(caverphone.encode("Thompson"), "TMPSN1");
}
fn main() {
use rphonetic::{Caverphone2, Encoder};
let caverphone = Caverphone2;
assert_eq!(caverphone.encode("Thompson"), "TMPSN11111");
}
科隆
fn main() {
use rphonetic::{Cologne, Encoder};
let cologne = Cologne;
assert_eq!(cologne.encode("m\u{00FC}ller"), "657");
}
Daitch-Mokotoff
fn main() -> Result<(), rphonetic::PhoneticError> {
use rphonetic::{DaitchMokotoffSoundex, DaitchMokotoffSoundexBuilder, Encoder};
const COMMONS_CODEC_RULES: &str = include_str!("./rules/dmrules.txt");
let encoder = DaitchMokotoffSoundexBuilder::with_rules(COMMONS_CODEC_RULES).build()?;
assert_eq!(encoder.soundex("Rosochowaciec"), "944744|944745|944754|944755|945744|945745|945754|945755");
Ok(())
}
匹配评分方法
fn main() {
use rphonetic::{Encoder, MatchRatingApproach};
let match_rating = MatchRatingApproach;
assert_eq!(match_rating.encode("Smith"), "SMTH");
}
Metaphone
fn main() {
use rphonetic::{Encoder, Metaphone};
let metaphone = Metaphone::default();
assert_eq!(metaphone.encode("Joanne"), "JN");
}
Metaphone (Double)
fn main() {
use rphonetic::{DoubleMetaphone, Encoder};
let double_metaphone = DoubleMetaphone::default();
assert_eq!(double_metaphone.encode("jumped"), "JMPT");
assert_eq!(double_metaphone.encode_alternate("jumped"), "AMPT");
}
Phonex
fn main() {
use rphonetic::{Phonex, Encoder};
// Strict
let phonex = Phonex::default();
assert_eq!(phonex.encode("William"),"W450");
}
Nysiis
fn main() {
use rphonetic::{Nysiis, Encoder};
// Strict
let nysiis = Nysiis::default();
assert_eq!(nysiis.encode("WESTERLUND"),"WASTAR");
// Not strict
let nysiis = Nysiis::new(false);
assert_eq!(nysiis.encode("WESTERLUND"),"WASTARLAD");
}
Soundex
fn main() {
use rphonetic::{Encoder, Soundex};
let soundex = Soundex::default();
assert_eq!(soundex.encode("jumped"), "J513");
}
Soundex (Refined)
fn main() {
use rphonetic::{Encoder, RefinedSoundex};
let refined_soundex = RefinedSoundex::default();
assert_eq!(refined_soundex.encode("jumped"), "J408106");
}
基准测试
基准测试使用 criterion。
它们是在 Intel® Core™ i7-4720HQ 和 16GB RAM 的电脑上完成的。
要运行与 main
基线基准测试
cargo bench --bench benchmark -- --baseline main
要替换 main
基线
cargo bench --bench benchmark -- --save-baseline main
不要在 CI 中运行 Criterion 基准测试。
依赖关系
~3.5–5.5MB
~99K SLoC