10 个稳定版本

1.0.9	2024 年 3 月 15 日
1.0.8	2018 年 12 月 18 日
1.0.7	2018 年 8 月 24 日
1.0.6	2018 年 1 月 23 日
1.0.3	2016 年 12 月 27 日

在文本处理中排名 #209

54,547 次每月下载
在 54 个 Crates 中使用 (通过 polars-ops)

MIT/Apache

10KB
63 行

unicode-reverse

为 Rust UTF-8 字符串提供 Unicode 感知的原地字符串反转。

该 reverse_grapheme_clusters_in_place 函数原地反转字符串切片而不在堆上分配任何内存。它正确处理多字节 UTF-8 序列和图形群组，包括组合标记和表情符号等天体字符。

示例

use unicode_reverse::reverse_grapheme_clusters_in_place;

let mut x = "man\u{0303}ana".to_string();
println!("{}", x); // prints "mañana"

reverse_grapheme_clusters_in_place(&mut x);
println!("{}", x); // prints "anañam"

背景

如 Mathias Bynens 的这篇文章所述，天真地反转 Unicode 字符串可能会以多种方式出错。例如，仅仅反转字符串中的 chars（Unicode 标量值）可能会导致组合标记附加到错误的字符上

let x = "man\u{0303}ana";
println!("{}", x); // prints "mañana"

let y: String = x.chars().rev().collect();
println!("{}", y); // prints "anãnam": Oops! The '~' is now applied to the 'a'.

反转字符串的图形群组可以解决这个问题

extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;

fn main() {
    let x = "man\u{0303}ana";
    let y: String = x.graphemes(true).rev().collect();
    println!("{}", y); // prints "anañam"
}

这个 crate 中的 reverse_grapheme_clusters_in_place 函数执行相同的操作，但以原地方式执行反转而不是分配新的字符串。

注意：如果输入字符串包含某些不可打印的控制代码，如方向格式化字符，即使图形级别反转也可能产生意外的输出。处理此类字符超出了此 crate 的范围。

算法

实现非常简单。它对字符串内容进行两次遍历

对于每个图形群组，原地反转图形群组内的字节。
原地反转整个字符串的字节。

第二次遍历后，每个图形群组已经反转了两次，因此其字节现在回到原始顺序，但群组现在在字符串中是相反的顺序。

无 std

此 crate 依赖于 libstd，因此可以在 无 std 项目中使用。

依赖

~550KB