3个版本

0.1.2	2024年4月1日
0.1.1	2023年6月5日
0.1.0	2023年6月4日

#365 在文本处理

每月 21 次下载
在 5 个Crate中使用（通过 text-scanner）

MIT 许可证

31KB
448 行

char-ranges

类似于标准库的 .char_indicies()，但不是只产生起始字节位置。此库实现了 .char_ranges()，可以产生起始和结束字节位置。

请注意，简单地使用 .char_indicies() 并通过将返回的索引 i 映射到 i..(i + 1) 来创建范围，并不能保证是有效的。因为有些UTF-8字符可以达到4个字节。

字符	字节	范围
`'O'`	1	`0..1`
`'Ø'`	2	`0..2`
`'∈'`	3	`0..3`
`'🌏'`	4	`0..4`

假设编码为UTF-8。

实现专门针对 last()、nth()、next_back() 和 nth_back()。这样就不会浪费地计算中间字符的长度。

示例

use char_ranges::CharRangesExt;

let text = "Hello 🗻∈🌏";

let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "Hello 🗻∈🌏");

assert_eq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((1..2, 'e')));
assert_eq!(chars.next(), Some((2..3, 'l')));
assert_eq!(chars.next(), Some((3..4, 'l')));
assert_eq!(chars.next(), Some((4..5, 'o')));
assert_eq!(chars.next(), Some((5..6, ' ')));

// Get the remaining substring
assert_eq!(chars.as_str(), "🗻∈🌏");

assert_eq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes
assert_eq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes
assert_eq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes
assert_eq!(chars.next(), None);

`DoubleEndedIterator`

CharRanges 也实现了 DoubleEndedIterator，使其能够反向迭代。

use char_ranges::CharRangesExt;

let text = "ABCDE";

let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "ABCDE");

assert_eq!(chars.next(), Some((0..1, 'A')));
assert_eq!(chars.next_back(), Some((4..5, 'E')));
assert_eq!(chars.as_str(), "BCD");

assert_eq!(chars.next_back(), Some((3..4, 'D')));
assert_eq!(chars.next(), Some((1..2, 'B')));
assert_eq!(chars.as_str(), "C");

assert_eq!(chars.next(), Some((2..3, 'C')));
assert_eq!(chars.as_str(), "");

assert_eq!(chars.next(), None);

偏移量范围

如果输入的 text 是某些原始文本的子串，并且希望生成的范围相对于子串进行偏移。那么不要使用 .char_ranges()，而是使用 .char_ranges_offset(offset) 或 .char_ranges().offset(offset)。

use char_ranges::CharRangesExt;

let text = "Hello 👋 World 🌏";

let start = 11; // Start index of 'W'
let text = &text[start..]; // "World 🌏"

let mut chars = text.char_ranges_offset(start);
// or
// let mut chars = text.char_ranges().offset(start);

assert_eq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((12..13, 'o')));
assert_eq!(chars.next(), Some((13..14, 'r')));

assert_eq!(chars.next_back(), Some((17..21, '🌏'))); // This char is 4 bytes

no-std char-ranges

3个版本

char-ranges

示例

`DoubleEndedIterator`

偏移量范围

无运行时依赖