#speech #pattern #string #regex #replace #diacritics #accents

bin+lib pink_accents

在字符串中替换模式以模拟语音口音

5 个版本

0.0.6 2023 年 11 月 17 日
0.0.5 2023 年 11 月 15 日
0.0.0 2022 年 3 月 24 日

#380文本处理

AGPL-3.0

52KB
1K SLoC

pink accents

允许定义要替换的字符串中的模式集。这是一个华丽的正则表达式替换,一系列的它们。主要用例是模拟愚蠢的语音口音。

最初基于 python pink-accents,主要用于 ssnt 游戏。

目前无法独立使用,因为你不能使用内部结构构造 Accent,但有计划支持程序定义。

替换类型

口音是一系列规则,按顺序应用。每个规则由正则表达式模式和替换组成。当正则表达式匹配发生时,调用替换。然后它决定(如果有的话)要放置什么。

可能的替换包括

  • Original:不替换
  • Simple:将字符串原样输出(支持模板化)
  • Any(递归):随机选择具有相等权重的替换
  • Weights(递归):根据相对权重选择替换
  • Uppercase(递归):将内部结果转换为大写
  • Lowercase(递归):将内部结果转换为小写

序列化格式

deserialize 功能提供了一种定义规则的有意见方法,特别设计用于语音口音。反序列化主要开发用于支持 ron 格式,它有其怪癖,但应在 json 和可能的其他格式中工作。

完整参考

(
    // on by default, tries to match input case with output after each rule
    // for example, if you replaced "HELLO" with "bye", it would use "BYE" instead
    normalize_case: true,

    // pairs of (regex, replacement)
    // this is same as `patterns` except that each regex is surrounded with \b to avoid copypasting.
    // `words` are applied before `patterns`
    words: [
        // this is the simplest rule to replace all "windows" words (separated by regex \b)
        // occurences with "linux", case sensitive
        ("windows", Simple("linux")),
        // this replaces word "OS" with one of replacements, with equal probability
        ("os", Any([
            Simple("Ubuntu"),
            Simple("Arch"),
            Simple("Gentoo"),
        ])),
        // `Simple` supports regex templating: https://docs.rs/regex/latest/regex/struct.Regex.html#example-9
        // this will swwap "a" and "b" "ab" -> "ba"
        (r"(a)(?P<b_group>b)", Simple("$b_group$a")),
    ],

    // pairs of (regex, replacement)
    // this is same as `words` except these are used as is, without \b
    patterns: [
        // inserts one of the honks. first value of `Weights` is relative weight. higher is better
        ("$", Weights([
            (32, Simple(" HONK!")),
            (16, Simple(" HONK HONK!")),
            (08, Simple(" HONK HONK HONK!")),
            // ultra rare sigma honk - 1 / 56
            (01, Simple(" HONK HONK HONK HONK!!!!!!!!!!!!!!!")),
        ])),
        // lowercases all `p` letters (use "p" match from `Original`, then lowercase)
        ("p", Lowercase(Original)),
        // uppercases all `p` letters, undoing previous operation
        ("p", Uppercase(Original)),
    ],

    // accent can be used with intensity (non negative value). higher intensities can either extend
    // lower level or completely replace it.
    // default intensity is 0. higher ones are defined here
    intensities: {
        // extends previous intensity (level 0, base one in this case), adding additional rules
        // below existingones. words and patterns keep their relative order though - words are
        // processed first
        1: Extend(
            (
                words: [
                    // even though we are extending, defining same rule will overwrite result.
                    // relative order of rules remain the same: "windows" will remain first
                    ("windows", Simple("windoos")),
                ],

                // extend patterns, adding 1 more rule
                patterns: [
                    // replacements can be nested arbitrarily
                    ("[A-Z]", Weights([
                        // 50% to replace capital letter with one of the Es
                        (1, Any([
                            Simple("E"),
                            Simple("Ē"),
                            Simple("Ê"),
                            Simple("Ë"),
                            Simple("È"),
                            Simple("É"),
                        ])),
                        // 50% to do nothing, no replacement
                        (1, Original),
                    ])),
                ],
            ),
        ),

        // replace intensity 1 entirely. in this case with nothing. remove all rules on intensity 2+
        2: Replace(()),
    },
)

示例 文件夹中查看更多示例。

依赖项

~2.4–4MB
~69K SLoC