8 个稳定版本

1.1.0	2021年6月19日
1.0.6	2020年10月28日
1.0.5	2020年5月10日
1.0.4	2020年4月9日
1.0.2	2019年9月19日

#21 in #pest-grammar

19,335 每月下载量
用于 37 个 crates (2 直接)

MIT/Apache

26KB
640 行

pest_consume

pest_consume 扩展 pest，使其易于消费 pest 解析树。

动机

当使用 pest 编写解析器时，必须手动遍历生成的无类型解析树以提取应用程序其余部分将使用的数据。这通常会使代码容易出错、难以阅读，并且在语法更新时经常崩溃。

pest_consume 力求使此解析阶段更简单、更干净、更健壮。

pest_consume 的功能包括

强类型；
使用直观的语法消费解析节点；
易于错误处理；
你再也不需要再次编写 .into_inner().next().unwrap()。

实现解析器

让我们从解析 CSV 文件的 pest 语法开始

field = { (ASCII_DIGIT | "." | "-")+ }
record = { field ~ ("," ~ field)* }
file = { SOI ~ (record ~ ("\r\n" | "\n"))* ~ EOI }

以及相应的 pest 解析器

use pest_consume::Parser;
// Construct the first half of the parser using pest as usual.
#[derive(Parser)]
#[grammar = "../examples/csv/csv.pest"]
struct CSVParser;

为了完成解析器，定义一个带有 pest_consume::parser 属性的 impl 块，并为语法中的每个（非静默）规则定义一个具有相同名称的方法。注意我们如何为每个规则选择一个输出类型。

use pest_consume::Error;
type Result<T> = std::result::Result<T, Error<Rule>>;
type Node<'i> = pest_consume::Node<'i, Rule, ()>;

// This is the other half of the parser, using pest_consume.
#[pest_consume::parser]
impl CSVParser {
    fn EOI(_input: Node) -> Result<()> {
        Ok(())
    }
    fn field(input: Node) -> Result<f64> {
        ...
    }
    fn record(input: Node) -> Result<Vec<f64>> {
        ...
    }
    fn file(input: Node) -> Result<Vec<Vec<f64>>> {
        ...
    }
}

这将为你的类型实现 Parser，这样就可以对其调用 Parser::parse。我们现在可以定义一个完整的解析器，该解析器返回一个结构化结果

fn parse_csv(input_str: &str) -> Result<Vec<Vec<f64>>> {
    // Parse the input into `Nodes`
    let inputs = CSVParser::parse(Rule::file, input_str)?;
    // There should be a single root node in the parsed tree
    let input = inputs.single()?;
    // Consume the `Node` recursively into the final value
    CSVParser::file(input)
}

只需实现每个规则的解析。对于没有子规则的规则，情况比较简单。在这种情况下，我们通常只关心捕获的字符串，可以使用 Node::as_str 来访问。

    fn field(input: Node) -> Result<f64> {
        // Get the string captured by this node
        input.as_str()
            // Convert it into the type we want
            .parse::<f64>()
            // In case of  an error, we use `Node::error` to link the error
            // with the part of the input that caused it
            .map_err(|e| input.error(e))
    }

当规则有子规则时，match_nodes 宏提供了解析子规则的类型化方法。 match_nodes 使用与切片模式相似的语法，并允许有多个分支，类似于 match 表达式。

我们为每个分支指定子规则的期望规则，宏将递归地消费子规则，并将结果提供给分支的主体。特殊的 .. 语法表示可变长度的模式：它将匹配给定规则的零个或多个子规则，并提供一个包含结果的迭代器。

use pest_consume::match_nodes;
...
    fn record(input: Node) -> Result<Vec<f64>> {
        // Checks that the children all match the rule `field`, and captures
        // the parsed children in an iterator. `fds` implements
        // `Iterator<Item=f64>` here.
        Ok(match_nodes!(input.into_children();
            [field(fds)..] => fds.collect(),
        ))
    }

对于 file 规则的情况类似。

示例

一些玩具示例可以在 examples/ 目录中找到。一个现实世界的示例可以在 dhall-rust 中找到。

工作原理

此crate的主要类型（Node、Nodes 和 Parser）主要是围绕相应的 pest 类型构建的，分别是 Pair、Pairs 和 Parser。如果需要，可以访问包装的类型，但这很少是必要的。

pest_consume::parser 宏为您的类型实现了 Parser trait，并启用了一些高级功能，如优先级提升和规则别名。实际上，大部分魔法都在 match_nodes 中发生；有关详细信息，请参阅那里。

高级功能

有关优先级提升、通过解析器传递自定义数据等更多信息，请参阅此处。

兼容性

与 rust >= 1.45 兼容（因为它在表达式位置导出 proc-macro）。

许可

根据您的选择，许可协议为以下之一

Apache License，版本 2.0 (https://apache.ac.cn/licenses/LICENSE-2.0)
MIT 许可证 (https://open-source.org.cn/licenses/MIT)

任选其一。

贡献

除非您明确声明，否则根据 Apache-2.0 许可证定义的，您提交给包含在您的工作中的任何贡献，将按上述方式双重许可，而不附加任何额外条款或条件。

许可：MIT OR Apache-2.0

依赖

~1.5MB
~34K SLoC