5 个版本

0.0.6	2023 年 11 月 17 日
0.0.5	2023 年 11 月 15 日
0.0.0	~~2022 年 3 月 24 日~~

#380 在文本处理

AGPL-3.0

52KB
1K SLoC

pink accents

允许定义要替换的字符串中的模式集。这是一个华丽的正则表达式替换，一系列的它们。主要用例是模拟愚蠢的语音口音。

最初基于 python pink-accents，主要用于 ssnt 游戏。

目前无法独立使用，因为你不能使用内部结构构造 Accent，但有计划支持程序定义。

替换类型

口音是一系列规则，按顺序应用。每个规则由正则表达式模式和替换组成。当正则表达式匹配发生时，调用替换。然后它决定（如果有的话）要放置什么。

可能的替换包括

Original：不替换
Simple：将字符串原样输出（支持模板化）
Any（递归）：随机选择具有相等权重的替换
Weights（递归）：根据相对权重选择替换
Uppercase（递归）：将内部结果转换为大写
Lowercase（递归）：将内部结果转换为小写

序列化格式

deserialize 功能提供了一种定义规则的有意见方法，特别设计用于语音口音。反序列化主要开发用于支持 ron 格式，它有其怪癖，但应在 json 和可能的其他格式中工作。

完整参考

(
    // on by default, tries to match input case with output after each rule
    // for example, if you replaced "HELLO" with "bye", it would use "BYE" instead
    normalize_case: true,

    // pairs of (regex, replacement)
    // this is same as `patterns` except that each regex is surrounded with \b to avoid copypasting.
    // `words` are applied before `patterns`
    words: [
        // this is the simplest rule to replace all "windows" words (separated by regex \b)
        // occurences with "linux", case sensitive
        ("windows", Simple("linux")),
        // this replaces word "OS" with one of replacements, with equal probability
        ("os", Any([
            Simple("Ubuntu"),
            Simple("Arch"),
            Simple("Gentoo"),
        ])),
        // `Simple` supports regex templating: https://docs.rs/regex/latest/regex/struct.Regex.html#example-9
        // this will swwap "a" and "b" "ab" -> "ba"
        (r"(a)(?P<b_group>b)", Simple("$b_group$a")),
    ],

    // pairs of (regex, replacement)
    // this is same as `words` except these are used as is, without \b
    patterns: [
        // inserts one of the honks. first value of `Weights` is relative weight. higher is better
        ("$", Weights([
            (32, Simple(" HONK!")),
            (16, Simple(" HONK HONK!")),
            (08, Simple(" HONK HONK HONK!")),
            // ultra rare sigma honk - 1 / 56
            (01, Simple(" HONK HONK HONK HONK!!!!!!!!!!!!!!!")),
        ])),
        // lowercases all `p` letters (use "p" match from `Original`, then lowercase)
        ("p", Lowercase(Original)),
        // uppercases all `p` letters, undoing previous operation
        ("p", Uppercase(Original)),
    ],

    // accent can be used with intensity (non negative value). higher intensities can either extend
    // lower level or completely replace it.
    // default intensity is 0. higher ones are defined here
    intensities: {
        // extends previous intensity (level 0, base one in this case), adding additional rules
        // below existingones. words and patterns keep their relative order though - words are
        // processed first
        1: Extend(
            (
                words: [
                    // even though we are extending, defining same rule will overwrite result.
                    // relative order of rules remain the same: "windows" will remain first
                    ("windows", Simple("windoos")),
                ],

                // extend patterns, adding 1 more rule
                patterns: [
                    // replacements can be nested arbitrarily
                    ("[A-Z]", Weights([
                        // 50% to replace capital letter with one of the Es
                        (1, Any([
                            Simple("E"),
                            Simple("Ē"),
                            Simple("Ê"),
                            Simple("Ë"),
                            Simple("È"),
                            Simple("É"),
                        ])),
                        // 50% to do nothing, no replacement
                        (1, Original),
                    ])),
                ],
            ),
        ),

        // replace intensity 1 entirely. in this case with nothing. remove all rules on intensity 2+
        2: Replace(()),
    },
)

在示例文件夹中查看更多示例。

依赖项

~2.4–4MB
~69K SLoC