#regex #string #replace #rules #tags #case #traits

bin+lib sayit

使用正则表达式进行字符串替换

7 个不稳定版本 (3 个破坏性更新)

0.3.0 2024年3月3日
0.2.0 2024年2月24日
0.1.2 2024年2月17日
0.0.8 2024年2月10日
0.0.7 2024年1月27日

#503文本处理

Download history 5/week @ 2024-03-12 4/week @ 2024-04-02

每月下载量 360

AGPL-3.0

79KB
2K SLoC

说它!

使用正则表达式进行字符串替换。

Crates.io Documentation

最初基于 python pink-accents,主要用于 ssnt 游戏。

概述

提供了一种定义替换字符串中文本的规则集的方法。每个规则由正则表达式模式和 Tag 特性对象组成。原始用例是通过文本模拟语音口音中的发音错误。

查看 docs.rs 文档以获取 API 概述。

序列化格式

完整参考

(
    // Consists of named blocks named "pass" that are applied in top to bottom order
    // pass names must be unique. they are used if you want to extend accent
    accent: {
        // First pass
        "words": (
            // This optional field instructs all regexes inside this pass to be wrapped in
            // regex word boundaries
            format: r"\<{}\>",

            // Pairs of (regex, tag)
            rules: {
                // Simplest rule to replace all "windows" words occurences with "spyware"
                "windows": {"Literal": "spyware"},

                // This replaces word "os" with one of tags, with equal probability
                "os": {"Any": [
                    {"Literal": "Ubuntu"},
                    {"Literal": "Arch"},
                    {"Literal": "Gentoo"},
                ]},

                // `Literal` supports regex templating:
                // https://docs.rs/regex/latest/regex/struct.Regex.html#example-9
                // This will swap "a" and "b" using named and numbered groups
                r"(a)(?P<b_group>b)": {"Literal": "$b_group$1"},
            },
        ),

        // Second pass
        "patterns": (
            // Both rules use "(?-i)" which opts out of case insensivity
            rules: {
                // Lowercases all `P` letters
                "(?-i)P": {"Lower": {"Original": ()}},

                // Uppercases all `m` letters
                "(?-i)m": {"Upper": {"Original": ()}},
            },
        ),

        // Third pass. note that ^ and $ may overlap with words at beginning and
        // end of strings. These should be defined separately
        "ending": (
            rules: {
                // Selects honks using relative weights. Higher is better
                "$": {"Weights": {
                    32: {"Literal": " HONK!"},
                    16: {"Literal": " HONK HONK!"},
                    08: {"Literal": " HONK HONK HONK!"},
                    // Ultra rare sigma honk - 1 / 56 chance
                    01: {"Literal": " HONK HONK HONK HONK!!!!!!!!!!!!!!!"},
                }},
            },
        ),
    },

    // Accent can be used with intensity (non negative value). Higher
    // intensities can either extend lower level or completely replace it.
    // Default intensity (rules above) is 0. Higher ones are defined here
    intensities: {
        // Extends previous intensity (base one in this case), adding additional
        // rules and overwritiong passes that have same names.
        1: Extend({
            "words": (
                format: r"\<{}\>",
                rules: {
                    // Will overwrite "windows" pattern in "main" pass
                    "windows": {"Literal": "bloatware"},
                },
            ),

            // Extend "patterns", adding 1 more rule with new pattern
            "patterns": (
                name: "patterns",
                rules: {
                    "(?-i)[A-Z]": {"Weights": {
                        // 50% to replace capital letter with one of the Es
                        1: {"Any": [
                            {"Literal": "E"},
                            {"Literal": "Ē"},
                            {"Literal": "Ê"},
                            {"Literal": "Ë"},
                            {"Literal": "È"},
                            {"Literal": "É"},
                        ]},
                        // 50% to do nothing, no replacement
                        1: {"Original": ()},
                    }},
                },
            ),
        }),

        // Replace intensity 1 entirely. In this case with nothing
        2: Replace({}),
    },
)

示例 文件夹中查看更多示例。

命令行工具

此库附带一个简单的命令行工具,您可以使用以下命令安装:

cargo install sayit --features=cli

交互式会话

sayit --accent examples/scotsman.ron

应用于文件

cat filename.txt | sayit --accent examples/french.ron > newfile.txt

依赖项

~2.6–4.5MB
~78K SLoC