浏览 — 解析器 // Lib.rs

2 个不稳定版本

使用旧的 Rust 2015

0.3.0	2015年5月23日
0.2.0	2015年5月9日

#244 in 解析工具

MIT 许可证

25KB
547 行

#Peruse

Peruse 是一个为 Rust 编写的小型解析器组合库。目标是能够编写干净、高效的解析器，足以处理大多数语法。这个项目是我第一次尝试 Rust，是一个非常正在进行中的项目。欢迎评论、建议和 PR。

解析器是一个对象，它将某种输入类型转换为输出类型。虽然这些解析器可以与任何输入和输出类型一起工作，但它们主要专注于将一系列项目转换为更结构化的形式，如抽象语法树。每个解析器都返回一个输出值以及另一个输入值，对于切片而言，这是输入的剩余部分。因此，解析器可以连接在一起，以便一个解析器的结果传递给下一个。

Peruse 包含两种类型的解析器

递归下降解析器 - 这些是更典型的解析器，通常用于构建递归数据结构，如 AST 或 JSON。
流解析器（即将推出） - 这些是有状态的解析器，能够分块接收输入数据。这些对于网络协议非常有用。

示例

切片解析器

切片解析器期望输入是类型 T 的切片。解析器消耗切片开始的一或多个元素，并返回一个输出值以及切片的其余部分。

use peruse::*;
use peruse::slice_parsers::*;

//let's start with something simple, a parser that looks for one particular
//integer as the first element of a given slice

let p1 = lit(3);

//calling parse will return a ParseResult, containing the parsed value along
//with a slice of any unparsed data

println!("{:?}", p1.parse(&[3, 1, 2]) );
//Ok((3, [1, 2]))

println!("{:?}", p1.parse(&[4, 1, 2]) );
//Err("Literal mismatch")

//now we can start to chain parsers together

let p2 = lit(3).or(lit(4));

println!("{:?}", p2.parse(&[4, 1, 2]) );
//Ok((4, [1, 2]))

//and turn the parsed items into other types

let p3 = lit(3).or(lit(4)).then(lit(1)).map(|(a, b)| a + b);

println!("{:?}", p3.parse(&[4, 1, 2]) );
//Ok((5, [2]))


//let's say we have the following array
let arr = [1, 0, 1, 0, 1, 0];

//how about we write a parser to count the number of sequences of 1, 0

let p4 = lit(1).then(lit(0)).repeat().map(|v| v.len());

println!("{:?}", p4.parse(&arr)); 
//Ok((3, []))

//lastly we can define a recursive parser in a static function
fn recurse() -> Box<SliceParser<i32, i32>> {
  let end = lit(1).map(|_| 0);
  let rec = lit(0).then_r(recursive(|| recurse())).map(|t| t + 1);
  Box::new(end.or(rec))
}

println!("{:?}",count_zeros().parse(&[0,0,0,0,0,1]));
//Ok((5, []))

包含的测试提供了所有现有解析器的基本示例以及一些更复杂的示例。

对于更实际的示例，请查看 Coki，我正在开发的一个非常简单的编程语言。Peruse 用于词法和 AST 解析器。

其他注意事项

在大多数情况下，构造的解析器尽可能使用静态分发。我的最终目标是完全使用静态分发，我仍在努力实现。

请注意，由于 rustc 中的一个持续问题，您的代码的编译时间将与解析器的复杂性呈指数增长。在实践中，我发现组合超过大约 10 次后事情会变得很糟糕。您可以通过将解析器装箱来解决这个问题

let parser = lit(1).or(lit(2)).or(lit(3)).repeat().then(opt(lit(4).then(lit(5))));
let boxed = boxed(parser);  //creates a BoxedParser
let full_parser = boxed.or(lit(3));

这“扁平化”了解析器的类型签名到特例对象，这将在编译时间上有所改进，但会以运行时性能的代价（由于动态分发）而降低。但在大多数情况下，由于您仅在约 1/10 的解析上这样做，性能损失不应该太严重（理论上，我还没有测试过这些）。

目前，切片解析器无法返回对输入数据的指针。正在考虑是否可行，但我认为我们可能需要等待更高阶的类型。即将推出的 StreamParsers 可能允许这样做。

依赖关系

~3.5MB
~75K SLoC