3 个不稳定版本

0.2.0	2024年2月26日
0.1.1	2024年2月25日
0.1.0	2024年2月19日

#484 在解析器实现

用于 spex

MPL-2.0 许可证

135KB
2K SLoC

简单的解析器包。

此包提供了一个 Parser，允许您查看字符流中的下一个字符，并读取期望的字符。例如，方法包括

peek() - 返回流中的下一个字符，但不移除它；
require(&str) - 如果在流中找不到给定的序列，则返回错误；
skip_while(Fn(char)->bool) - 当谓词满足时，持续移除字符；
read_up_to(char) - 从流中读取字符，直到（但不包括）给定的字符；
accept(char) - 如果在流中找到给定的字符，则跳过该字符。

所有可能失败的方法都返回一个 Result，且此包中的任何方法都不应该引发恐慌。

解析依赖于一个包装在字节流周围的 ByteBuffer，以及一个包装在 ByteBuffer 并将字节解码为字符的解码器，例如 Utf8Decoder。最后，通过包装一个解码器创建一个 Parser。最终结果是允许您查看和读取字符的 Parser。

示例

Acornsoft 标志解析器

假设您想解析一组（简化版）Acornsoft Logo 指令，您只想接受“FORWARD”、“LEFT”和“RIGHT”指令，每个指令必须单独一行（由换行符分隔），并且每个指令后面可以跟任意数量的空格字符，然后跟一个数值。示例输入可能如下所示

FORWARD 10
RIGHT 45
FORWARD 20
RIGHT 10
FORWARD 5
LEFT 3

您可以使用 sipp 通过以下代码解析这些指令

let input =
  "FORWARD 10\nRIGHT 45\nFORWARD 20\nRIGHT 10\nFORWARD 5\nLEFT 3";
// We know that Rust strings are UTF-8 encoded, so wrap the input
// bytes with a Utf8Decoder.
let decoder = Utf8Decoder::wrap(input.as_bytes());
// Now wrap the decoder with a Parser to give us useful methods
// for reading through the input.
let mut parser = Parser::wrap(decoder);
// Keep reading while there is still input available.
while parser.has_more()? {
    // Read the command by reading everything up to (but not
    // including) the next space.
    let command = parser.read_up_to(' ')?;
    // Skip past the (one or more) space character.
    parser.skip_while(|c| c == ' ')?;
    // Read until the next newline (or the end of input, whichever
    // comes first).
    let number = parser.read_up_to('\n')?;
    // Now either there is no further input, or the next character
    // must be a newline. If the next character is a newline, skip
    // past it.
    parser.accept('\n')?;
}

逗号分隔列表解析器

给定一个表示逗号分隔列表的硬编码字符串，您可以使用此包按如下方式解析它

let input = "first value,second value,third,fourth,fifth,etc";
let buffer = ByteBuffer::wrap(input.as_bytes());
let decoder = Utf8Decoder::wrap_buffer(buffer);
let mut parser = Parser::wrap(decoder);
let mut value_list = Vec::new();
// Keep reading while input is available.
while parser.has_more()? {
    // Read up to the next comma, or until the end of input
    // (whichever comes first).
    let value = parser.read_up_to(',')?;
    value_list.push(value);
    // Now either there is no further input, or the next character
    // must be a comma. If the next character is a comma, skip
    // past it.
    parser.accept(',')?;
}

assert_eq!(value_list
    .iter()
    .map(|s| s.to_string())
    .collect::<Vec<String>>(),
    vec!["first value",
        "second value",
        "third",
        "fourth",
        "fifth",
        "etc"]);

发行说明

0.1.0

初始版本。

0.1.1

向解析器中添加了 has_more 方法。
根据在 Rust API 指南中找到的建议调整了 rustdoc，主要是在将错误描述从引言部分分离出来，并将它们移动到每个方法的 rustdoc 注释中专门的 "错误" 部分。

0.2.0

更改了公共方法 Parser.read_up_to(char) 的返回类型，现在它返回 None 而不是空的 String。相应地调整了示例和单元测试。