#search-query #query-parser #condition #operator #elasticsearch #complex #dsl

search-query-parser

将复杂的搜索查询解析为分层搜索条件,以便轻松构建 Elasticsearch 查询 DSL 或其他。

5 个版本

0.1.4 2023年7月14日
0.1.3 2022年10月3日
0.1.2 2022年9月24日
0.1.1 2022年9月20日
0.1.0 2022年9月20日

#11 in #elasticsearch

MIT 许可证

140KB
3K SLoC

search-query-parser

crates.io docs.rs build

这个库是做什么的

search-query-parser 被设计用来将复杂的搜索查询解析为分层搜索条件,以便轻松构建 Elasticsearch 查询 DSL 或其他。

例如,以下复杂的搜索查询:↓↓↓

(word1 and-word2)or(("phrase word 1"or-"phrase word 2")and-("a long phrase word"or word3))

将被解析为以下分层搜索条件:↓↓↓

Condition::Operator(
    Operator::Or,
    vec![
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Keyword("word1".into()),
                Condition::Not(Box::new(Condition::Keyword("word2".into()))),
            ]
        ),
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::PhraseKeyword("phrase word 1".into()),
                        Condition::Not(Box::new(Condition::PhraseKeyword(
                            "phrase word 2".into()
                        )))
                    ]
                ),
                Condition::Not(Box::new(Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::PhraseKeyword(" a long phrase word ".into()),
                        Condition::Keyword("word3".into())
                    ]
                )))
            ]
        ),
    ]
)

条件是通过 enum Conditionenum Operator 构建的。

#[derive(Debug, Clone, Eq, PartialEq)]
pub enum Condition {
    None,
    Keyword(String),
    PhraseKeyword(String),
    Not(Box<Condition>),
    Operator(Operator, Vec<Condition>),
}

#[derive(Debug, Clone, Eq, PartialEq)]
pub enum Operator {
    And,
    Or,
}

用法

1. 用于 Rust 项目

[dependencies]
search-query-parser = "0.1.4"
use search_query_parser::parse_query_to_condition;

let condition = parse_query_to_condition("any query string you like")?;

2. 用于 REST Api

参考 search-query-parser-api 仓库

3. 通过 JNI 用于 JVM 语言

参考 search-query-parser-cdylib 仓库

解析规则

1. 空格 {\u0020} 或全角空格 {\u3000} 被识别为 AND 操作符

fn test_keywords_concat_with_spaces() {
    let actual = parse_query_to_condition("word1 word2").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Keyword("word1".into()),
                Condition::Keyword("word2".into())
            ]
        )
    )
}

2. AND 操作符的优先级高于 OR 操作符

fn test_keywords_concat_with_and_or() {
    let actual =
        parse_query_to_condition("word1 OR word2 AND word3").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::Or,
            vec![
                Condition::Keyword("word1".into()),
                Condition::Operator(
                    Operator::And,
                    vec![
                        Condition::Keyword("word2".into()),
                        Condition::Keyword("word3".into()),
                    ]
                )
            ]
        )
    )
}

3. 括号内的条件优先级更高

fn test_brackets() {
    let actual =
        parse_query_to_condition("word1 AND (word2 OR word3)")
            .unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Keyword("word1".into()),
                Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::Keyword("word2".into()),
                        Condition::Keyword("word3".into()),
                    ]
                )
            ]
        )
    )
}

4. 双引号将用于解析短语关键词

fn test_double_quote() {
    let actual = parse_query_to_condition(
        "\"word1 AND (word2 OR word3)\" word4",
    )
    .unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::PhraseKeyword(
                    "word1 AND (word2 OR word3)".into()
                ),
                Condition::Keyword("word4".into()),
            ]
        )
    )
}

5. 减号(hyphen)将用于解析负条件

※ 它可以用于关键词、短语关键词或括号之前

fn test_minus() {
    let actual = parse_query_to_condition(
        "-word1 -\"word2\" -(word3 OR word4)",
    )
    .unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Not(Box::new(Condition::Keyword("word1".into()))),
                Condition::Not(Box::new(Condition::PhraseKeyword("word2".into()))),
                Condition::Not(Box::new(Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::Keyword("word3".into()),
                        Condition::Keyword("word4".into())
                    ]
                ))),
            ]
        )
    )
}

6. 修复错误的搜索查询

  1. 空括号
fn test_empty_brackets() {
    let actual = parse_query_to_condition("A AND () AND B").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Keyword("A".into()),
                Condition::Keyword("B".into()),
            ]
        )
    )
}
  1. 反向括号
fn test_reverse_brackets() {
    let actual = parse_query_to_condition("A OR B) AND (C OR D").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::Or,
            vec![
                Condition::Keyword("A".into()),
                Condition::Operator(
                    Operator::And,
                    vec![
                        Condition::Keyword("B".into()),
                        Condition::Keyword("C".into()),
                    ]
                ),
                Condition::Keyword("D".into()),
            ]
        )
    )
}
  1. 括号数量错误
fn test_missing_brackets() {
    let actual = parse_query_to_condition("(A OR B) AND (C").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::Keyword("A".into()),
                        Condition::Keyword("B".into()),
                    ]
                ),
                Condition::Keyword("C".into()),
            ]
        )
    )
}
  1. 空短语关键词
fn test_empty_phrase_keywords() {
    let actual = parse_query_to_condition("A AND \"\" AND B").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Keyword("A".into()),
                Condition::Keyword("B".into()),
            ]
        )
    )
}
  1. 引号数量错误
fn test_invalid_double_quote() {
    let actual = parse_query_to_condition("\"A\" OR \"B OR C").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::Or,
            vec![
                Condition::PhraseKeyword("A".into()),
                Condition::Keyword("B".into()),
                Condition::Keyword("C".into()),
            ]
        )
    )
}
  1. and 或相邻
fn test_invalid_and_or() {
    let actual = parse_query_to_condition("A AND OR B").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::Or,
            vec![
                Condition::Keyword("A".into()),
                Condition::Keyword("B".into()),
            ]
        )
    )
}

7. 搜索查询优化

fn test_unnecessary_nest_brackets() {
    let actual = parse_query_to_condition("(A OR (B OR C)) AND D").unwrap();
    assert_eq!(
        actual,
        Condition::Operator(
            Operator::And,
            vec![
                Condition::Operator(
                    Operator::Or,
                    vec![
                        Condition::Keyword("A".into()),
                        Condition::Keyword("B".into()),
                        Condition::Keyword("C".into()),
                    ]
                ),
                Condition::Keyword("D".into()),
            ]
        )
    )
}

依赖关系

~2.7–4.5MB
~78K SLoC