22 个版本 (稳定)

1.11.0	2024 年 2 月 7 日
1.10.1	2023 年 1 月 31 日
1.10.0	2022 年 9 月 13 日
1.9.0	2022 年 2 月 7 日
0.1.1	2015 年 7 月 9 日

#9 在文本处理中

5,265,786 每月下载量
在 13,695 个 Crates 中使用 (571 直接使用)

MIT/Apache

620KB
8K SLoC

根据 Unicode 标准附录 #29 规则，分割字符串的迭代器，在图形群集或单词边界上

文档

use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let s = "a̐éö̲\r\n";
    let g = s.graphemes(true).collect::<Vec<&str>>();
    let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
    assert_eq!(g, b);

    let s = "The quick (\"brown\") fox can't jump 32.3 feet, right?";
    let w = s.unicode_words().collect::<Vec<&str>>();
    let b: &[_] = &["The", "quick", "brown", "fox", "can't", "jump", "32.3", "feet", "right"];
    assert_eq!(w, b);

    let s = "The quick (\"brown\")  fox";
    let w = s.split_word_bounds().collect::<Vec<&str>>();
    let b: &[_] = &["The", " ", "quick", " ", "(", "\"", "brown", "\"", ")", "  ", "fox"];
    assert_eq!(w, b);
}

no_std

unicode-segmentation 不依赖于 libstd，因此可以在具有 #![no_std] 属性的 Crates 中使用。

crates.io

您可以通过将以下内容添加到您的 Cargo.toml 来在项目中使用此包

[dependencies]
unicode-segmentation = "1.10.1"

变更日志

1.11.0

#124 更新数据到 Unicode 15.1
#128 向迭代器添加 size_hint

1.10.1

#113 使用 criterion.rs 进行单词基准测试
#112 通过查找提高表格搜索速度

1.10.0

#107 升级到 Unicode 15.0.0
#104 代替并修复 #75

1.9.0

#101 升级到 Unicode 14.0.0

1.8.0

#100 * #100 - 增加 #[inline] 机会，从而实现 15-40% 的性能提升。
#95 实现对 Graphemes 的调试
#94 添加 oss-fuzz 集成初始 fuzzer
#93 修复未使用导入和已弃用模式警告
#91 将局部变量移动到循环中以使其不可变
#91 添加新的迭代器 UnicodeWordIndices 和 unicode_word_indices

1.7.1

更新版本号文档

1.7.0

#87 升级到 Unicode 13
#79 实现对 ascii 图形类别特殊情况的查找
#77 图形迭代优化

1.6.0

#72 升级到 Unicode 12

1.5.0

#68 升级到 Unicode 11

1.4.0

#56 升级到 Unicode 10

1.3.0

#24 添加对句子边界的支持
#44 将 gc=No 视为 gc=N 的子集

1.2.1

#37: 修复 provide_context 中的 panic。
#40: 修复 prev_boundary 中的崩溃。

1.2.0

新的 GraphemeCursor API 允许随机访问和双向迭代。
修复了某些包含前置字符的图形簇拆分错误。

1.1.0

向迭代器类型添加 as_str 方法。

1.0.3

代码清理和额外的测试。

1.0.1

修复影响某些包含前置字符的图形簇的 bug。

1.0.0

升级到 Unicode 9.0.0。

no-std unicode-segmentation