7 个不稳定版本 (3 个破坏性更新)
0.3.0 | 2024年3月3日 |
---|---|
0.2.0 | 2024年2月24日 |
0.1.2 | 2024年2月17日 |
0.0.8 | 2024年2月10日 |
0.0.7 | 2024年1月27日 |
#503 在 文本处理
每月下载量 360
79KB
2K SLoC
说它!
使用正则表达式进行字符串替换。
最初基于 python pink-accents,主要用于 ssnt 游戏。
概述
提供了一种定义替换字符串中文本的规则集的方法。每个规则由正则表达式模式和 Tag 特性对象组成。原始用例是通过文本模拟语音口音中的发音错误。
查看 docs.rs 文档以获取 API 概述。
序列化格式
完整参考
(
// Consists of named blocks named "pass" that are applied in top to bottom order
// pass names must be unique. they are used if you want to extend accent
accent: {
// First pass
"words": (
// This optional field instructs all regexes inside this pass to be wrapped in
// regex word boundaries
format: r"\<{}\>",
// Pairs of (regex, tag)
rules: {
// Simplest rule to replace all "windows" words occurences with "spyware"
"windows": {"Literal": "spyware"},
// This replaces word "os" with one of tags, with equal probability
"os": {"Any": [
{"Literal": "Ubuntu"},
{"Literal": "Arch"},
{"Literal": "Gentoo"},
]},
// `Literal` supports regex templating:
// https://docs.rs/regex/latest/regex/struct.Regex.html#example-9
// This will swap "a" and "b" using named and numbered groups
r"(a)(?P<b_group>b)": {"Literal": "$b_group$1"},
},
),
// Second pass
"patterns": (
// Both rules use "(?-i)" which opts out of case insensivity
rules: {
// Lowercases all `P` letters
"(?-i)P": {"Lower": {"Original": ()}},
// Uppercases all `m` letters
"(?-i)m": {"Upper": {"Original": ()}},
},
),
// Third pass. note that ^ and $ may overlap with words at beginning and
// end of strings. These should be defined separately
"ending": (
rules: {
// Selects honks using relative weights. Higher is better
"$": {"Weights": {
32: {"Literal": " HONK!"},
16: {"Literal": " HONK HONK!"},
08: {"Literal": " HONK HONK HONK!"},
// Ultra rare sigma honk - 1 / 56 chance
01: {"Literal": " HONK HONK HONK HONK!!!!!!!!!!!!!!!"},
}},
},
),
},
// Accent can be used with intensity (non negative value). Higher
// intensities can either extend lower level or completely replace it.
// Default intensity (rules above) is 0. Higher ones are defined here
intensities: {
// Extends previous intensity (base one in this case), adding additional
// rules and overwritiong passes that have same names.
1: Extend({
"words": (
format: r"\<{}\>",
rules: {
// Will overwrite "windows" pattern in "main" pass
"windows": {"Literal": "bloatware"},
},
),
// Extend "patterns", adding 1 more rule with new pattern
"patterns": (
name: "patterns",
rules: {
"(?-i)[A-Z]": {"Weights": {
// 50% to replace capital letter with one of the Es
1: {"Any": [
{"Literal": "E"},
{"Literal": "Ē"},
{"Literal": "Ê"},
{"Literal": "Ë"},
{"Literal": "È"},
{"Literal": "É"},
]},
// 50% to do nothing, no replacement
1: {"Original": ()},
}},
},
),
}),
// Replace intensity 1 entirely. In this case with nothing
2: Replace({}),
},
)
在 示例 文件夹中查看更多示例。
命令行工具
此库附带一个简单的命令行工具,您可以使用以下命令安装:
cargo install sayit --features=cli
交互式会话
sayit --accent examples/scotsman.ron
应用于文件
cat filename.txt | sayit --accent examples/french.ron > newfile.txt
依赖项
~2.6–4.5MB
~78K SLoC