10 个版本

0.4.0 2022 年 10 月 2 日
0.3.0 2020 年 10 月 24 日
0.2.2 2020 年 5 月 31 日
0.1.4 2020 年 4 月 6 日

#37解析工具

Download history 6090/week @ 2024-03-14 6379/week @ 2024-03-21 5695/week @ 2024-03-28 3333/week @ 2024-04-04 3441/week @ 2024-04-11 4019/week @ 2024-04-18 4095/week @ 2024-04-25 3273/week @ 2024-05-02 6978/week @ 2024-05-09 5159/week @ 2024-05-16 4483/week @ 2024-05-23 4405/week @ 2024-05-30 4390/week @ 2024-06-06 3023/week @ 2024-06-13 4813/week @ 2024-06-20 4118/week @ 2024-06-27

17,330 每月下载量
11 个包中使用 (直接使用 8 个)

MIT 许可证

18KB
194

pratt - Rust 通用 Pratt 解析器

github crates.io crates.io

该包利用高级接口在 Rust 中实现 Pratt 解析器。

在计算机科学中,Pratt 解析器是一种改进的递归下降解析器,它将语义与标记相关联,而不是与语法规则相关联。

换句话说,您可以使用 Pratt 解析器解析可能包含不同 优先级结合性一元二元 运算符的表达式树。

示例

假设我们想要解析一个表达式 !1?*-3+3/!2^4?-1(((((!(1))?)*(-(3)))+((3)/((!((2)^(4)))?)))-(1))

我们的策略是实现一个解析器,将源代码解析成标记树,然后将标记树解析成表达式树。完整的实现可以在这里查看。这个例子使用了LALRPOP。一个使用pest解析器的完整实现可以在这里找到。

// From this
#[derive(Debug)]
pub enum TokenTree {
    Prefix(char),
    Postfix(char),
    Infix(char),
    Primary(i32),
    Group(Vec<TokenTree>),
}

// To this
#[derive(Debug)]
pub enum Expr {
    BinOp(Box<Expr>, BinOpKind, Box<Expr>),
    UnOp(UnOpKind, Box<Expr>),
    Int(i32),
}

#[derive(Debug)]
pub enum BinOpKind {
    Add, // +
    Sub, // -
    Mul, // *
    Div, // /
    Pow, // ^
    Eq,  // =
}

#[derive(Debug)]
pub enum UnOp {
    Not, // !
    Neg, // -
    Try, // ?
}

我们使用LALRPOP将源代码解析成标记树。

LALRPOP 语法

use crate::TokenTree;

grammar;

pub TokenTree = Group;

Group: Vec<TokenTree> = <prefix:Prefix*> <primary:Primary> <mut postfix:Postfix*>
                   <rest:(Infix Prefix* Primary Postfix*)*> => {
    let mut group = prefix;
    group.push(primary);
    group.append(&mut postfix);
    for (infix, mut prefix, primary, mut postfix) in rest {
        group.push(infix);
        group.append(&mut prefix);
        group.push(primary);
        group.append(&mut postfix);
    }
    group
};

Primary: TokenTree = {
    "(" <Group> ")" => TokenTree::Group(<>),
    r"[0-9]+"       => TokenTree::Primary(<>.parse::<i32>().unwrap()),
}

Infix: TokenTree = {
    "+" => TokenTree::Infix('+'),
    "-" => TokenTree::Infix('-'),
    "*" => TokenTree::Infix('*'),
    "/" => TokenTree::Infix('/'),
    "=" => TokenTree::Infix('='),
    "^" => TokenTree::Infix('^'),
}

Prefix: TokenTree = {
    "-" => TokenTree::Prefix('-'),
    "!" => TokenTree::Prefix('!'),
}

Postfix: TokenTree = {
    "?" => TokenTree::Postfix('?'),
}

然后,对于Pratt解析器,我们定义了一个struct ExprParser并为其实现了pratt::ExprParser

use pratt::{Affix, Associativity, PrattParser, Precedence, Result};

struct ExprParser;

impl<I> PrattParser<I> for ExprParser
where
    I: Iterator<Item = TokenTree>,
{
    type Error = pratt::NoError;
    type Input = TokenTree;
    type Output = Expr;

    // Query information about an operator (Affix, Precedence, Associativity)
    fn query(&mut self, tree: &TokenTree) -> Result<Affix> {
        let affix = match tree {
            TokenTree::Infix('=') => Affix::Infix(Precedence(2), Associativity::Neither),
            TokenTree::Infix('+') => Affix::Infix(Precedence(3), Associativity::Left),
            TokenTree::Infix('-') => Affix::Infix(Precedence(3), Associativity::Left),
            TokenTree::Infix('*') => Affix::Infix(Precedence(4), Associativity::Left),
            TokenTree::Infix('/') => Affix::Infix(Precedence(4), Associativity::Left),
            TokenTree::Postfix('?') => Affix::Postfix(Precedence(5)),
            TokenTree::Prefix('-') => Affix::Prefix(Precedence(6)),
            TokenTree::Prefix('!') => Affix::Prefix(Precedence(6)),
            TokenTree::Infix('^') => Affix::Infix(Precedence(7), Associativity::Right),
            TokenTree::Group(_) => Affix::Nilfix,
            TokenTree::Primary(_) => Affix::Nilfix,
            _ => unreachable!(),
        };
        Ok(affix)
    }

    // Construct a primary expression, e.g. a number
    fn primary(&mut self, tree: TokenTree) -> Result<Expr> {
        let expr = match tree {
            TokenTree::Primary(num) => Expr::Int(num),
            TokenTree::Group(group) => self.parse(&mut group.into_iter()).unwrap(),
            _ => unreachable!(),
        };
        Ok(expr)
    }

    // Construct a binary infix expression, e.g. 1+1
    fn infix(&mut self, lhs: Expr, tree: TokenTree, rhs: Expr) -> Result<Expr> {
        let op = match tree {
            TokenTree::Infix('+') => BinOpKind::Add,
            TokenTree::Infix('-') => BinOpKind::Sub,
            TokenTree::Infix('*') => BinOpKind::Mul,
            TokenTree::Infix('/') => BinOpKind::Div,
            TokenTree::Infix('^') => BinOpKind::Pow,
            TokenTree::Infix('=') => BinOpKind::Eq,
            _ => unreachable!(),
        };
        Ok(Expr::BinOp(Box::new(lhs), op, Box::new(rhs)))
    }

    // Construct a unary prefix expression, e.g. !1
    fn prefix(&mut self, tree: TokenTree, rhs: Expr) -> Result<Expr> {
        let op = match tree {
            TokenTree::Prefix('!') => UnOpKind::Not,
            TokenTree::Prefix('-') => UnOpKind::Neg,
            _ => unreachable!(),
        };
        Ok(Expr::UnOp(op, Box::new(rhs)))
    }

    // Construct a unary postfix expression, e.g. 1?
    fn postfix(&mut self, lhs: Expr, tree: TokenTree) -> Result<Expr> {
        let op = match tree {
            TokenTree::Postfix('?') => UnOpKind::Try,
            _ => unreachable!(),
        };
        Ok(Expr::UnOp(op, Box::new(lhs)))
    }
}

注意,方法接受&mut self作为参数,这使得解析器在解析时能够存储状态,例如累积错误并保持优先级/结合性信息。

要运行解析器

fn main() {
    let mut args = std::env::args();
    let _ = args.next();

    let input = args.next().expect("Expected input string");
    println!("Code: {}", input);

    let tt = grammar::TokenTreeParser::new().parse(&input).unwrap();
    println!("TokenTree: {:?}", tt);

    let expr = ExprParser.parse(&mut tt.into_iter()).unwrap();
    println!("Expression: {:?}", expr);
}

还有一些测试

#[cfg(test)]
mod test {
    fn parse(input: &str) -> Expr {
        let tt = grammar::TokenTreeParser::new()
            .parse(input)
            .unwrap()
            .into_iter();
        ExprParser.parse(&mut tt.into_iter()).unwrap()
    }
    use super::BinOpKind::*;
    use super::Expr::*;
    use super::UnOpKind::*;
    use super::*;

    #[test]
    fn test1() {
        assert_eq!(
            parse("1=2=3"),
            BinOp(Box::new(Int(1)), Eq, Box::new(Int(2)))
        );
    }

    #[test]
    fn test2() {
        assert_eq!(
            parse("1*2+3"),
            BinOp(
                Box::new(BinOp(Box::new(Int(1)), Mul, Box::new(Int(2)))),
                Add,
                Box::new(Int(3))
            )
        );
    }

    #[test]
    fn test3() {
        assert_eq!(
            parse("1*!2^3"),
            BinOp(
                Box::new(Int(1)),
                Mul,
                Box::new(UnOp(
                    Not,
                    Box::new(BinOp(Box::new(Int(2)), Pow, Box::new(Int(3))))
                ))
            )
        );
    }

    #[test]
    fn test4() {
        assert_eq!(
            parse("-1?*!2^3+3/2?-1"),
            BinOp(
                Box::new(BinOp(
                    Box::new(BinOp(
                        Box::new(UnOp(Try, Box::new(UnOp(Neg, Box::new(Int(1)))))),
                        Mul,
                        Box::new(UnOp(
                            Not,
                            Box::new(BinOp(Box::new(Int(2)), Pow, Box::new(Int(3))))
                        ))
                    )),
                    Add,
                    Box::new(BinOp(
                        Box::new(Int(3)),
                        Div,
                        Box::new(UnOp(Try, Box::new(Int(2))))
                    ))
                )),
                Sub,
                Box::new(Int(1))
            )
        );
    }
}

无运行时依赖