3个版本

使用旧的Rust 2015

0.7.4	2019年11月23日
0.7.3	2019年11月23日
0.7.2	2019年11月23日

在文本处理

每月下载22次
在spandex中使用

Apache-2.0/MIT

3.5MB
758 行

`连字符`

多种语言的UTF-8字符串的连字符。

[dependencies]
hyphenation = "0.7.1"

提供两种策略

标准Knuth–Liang连字符，字典由TeX UTF-8模式构建。
扩展（“非标准”）连字符，基于László Németh的OpenOffice.org中的自动非标准连字符，字典由Libre/OpenOffice模式构建。

文档

Docs.rs

用法

快速入门

hyphenation库依赖于连字符字典，这些字典是外部文件，必须加载到内存中。但是，最初将它们嵌入到编译后的工件中可能更方便。

[dependencies]
hyphenation = { version = "0.7.1", features = ["embed_all"] }

hyphenation的最高模块提供了一个小的序言，可以导入以公开最常用的功能。

use hyphenation::*;

// Retrieve the embedded American English dictionary for `Standard` hyphenation.
let en_us = Standard::from_embedded(Language::EnglishUS) ?;

// Identify valid breaks in the given word.
let hyphenated = en_us.hyphenate("hyphenation");

// Word breaks are represented as byte indices into the string.
let break_indices = &hyphenated.breaks;
assert_eq!(break_indices, &[2, 6]);

// The segments of a hyphenated word can be iterated over.
let segments = hyphenated.into_iter();
let collected : Vec<String> = segments.collect();
assert_eq!(collected, vec!["hy", "phen", "ation"]);

/// `hyphenate()` is case-insensitive.
let uppercase : Vec<_> = en_us.hyphenate("CAPITAL").into_iter().collect();
assert_eq!(uppercase, vec!["CAP", "I", "TAL"]);

运行时加载字典

当前可用的字典集约为7MB的数据，其嵌入很少被期望。大多数应用程序应首选在运行时加载单个字典，如下所示

let path_to_dict = "/path/to/en-us.bincode";
let english_us = Standard::from_path(Language::EnglishUS, path_to_dict) ?;

hyphenation捆绑的字典可以从target下的构建文件夹中检索，并按照需要与最终应用程序打包。

$ find target -name "dictionaries"
target/debug/build/hyphenation-33034db3e3b5f3ce/out/dictionaries

分割

字典可以与文本分割结合使用，以分割文本运行中的单词。以下简短的示例使用unicode-segmentation crate进行非定制Unicode分割。

use unicode_segmentation::UnicodeSegmentation;

let hyphenate_text = |text : &str| -> String {
    // Split the text on word boundaries—
    text.split_word_bounds()
        // —and hyphenate each word individually.
        .flat_map(|word| en_us.hyphenate(word).into_iter())
        .collect()
};

let excerpt = "I know noble accents / And lucid, inescapable rhythms; […]";
assert_eq!("I know no-ble ac-cents / And lu-cid, in-escapable rhythms; […]"
          , hyphenate_text(excerpt));

规范化

受规范化影响的语言的连字符模式有时会覆盖多个形式，由其作者决定，但通常不会。如果您要求hyphenation在由Unicode标准化附件#15描述并由unicode-normalization crate提供的已知规范化形式上的字符串上严格操作，您可以在Cargo清单中指定它，如下所示

[dependencies.hyphenation]
version = "0.7.1"
features = ["nfc"]

“features”字段可以恰好包含以下归一化选项中的一个

"nfc"，用于规范组合；
"nfd"，用于规范分解；
"nfkc"，用于兼容组合；
"nfkd"，用于兼容分解。

如果启用了归一化，建议在发布模式下构建hyphenation，因为捆绑的分隔模式需要重新处理成词典。

许可证

Apache License，版本2.0
MIT许可证

texhyphen和其他分隔模式归其各自所有者所有；有关许可信息，请参阅patterns/*.lic.txt文件。

spandex-hyphenation

3个版本

`连字符`

文档

用法

快速入门

运行时加载字典

分割

规范化

许可证

依赖项