ftml

25 个稳定版本

1.26.1	2024 年 8 月 3 日
1.25.1	2024 年 7 月 22 日
1.23.0	2024 年 3 月 11 日
1.22.2	2023 年 9 月 29 日
0.2.19	2020 年 1 月 6 日

#76 在解析器实现中排名

650 每月下载量
在 3 个包中使用（直接使用 2 个）

AGPL-3.0 或更高版本

1MB
17K SLoC

基础文本标记语言

(或者，ftml：标记语言)

A Rust 库，用于将 Wikidot 文本（"Wikitext"）解析为抽象语法树（AST）。此库旨在替代 Wikidot 中过时的 Text_Wiki。此版本的目的是提供一个几乎完全兼容常见 Wikidot 的解析器，包括常见的格式错误结构。目标是利用词法分析器生成器，并通过自定义解析器消费标记，以宽松的方式处理异常情况。

除了提供 Rust 的速度和安全优势外，这还提高了可维护性，并允许向消费者公开 AST，以进行更高级的分析和转换。

设置了一个 lint #![forbid(unsafe_code)]，因此此包只包含安全代码。然而，依赖项可能具有 unsafe 内部。

在 GNU Affero 通用公共许可证的条款下可用。见 LICENSE.md。这个库最初是 Wikijump 的一个部分，位于 /ftml，后来被移动到一个独立的存储库，按照 WJ-1219。此项目的问题将保留在 WJ Jira 项目中。

编译

本库针对最新的稳定版Rust。在撰写本文档时，版本号为 1.77.0。

$ cargo build --release

您可以通过将以下内容添加到您的 Cargo.toml 中使用此库作为依赖项

ftml = "1"

该库有两个特性

html（默认启用） — 这将HTML渲染器包含在crate中。
mathml（默认启用） — 这包括 latex2mathml，用于将任何LaTeX编译成MathML，以便在渲染的HTML中包含。

您可以通过不构建特性来禁用它们

$ cargo check --no-default-features

如果您想为ftml构建WebAssembly目标，请使用 wasm-pack

$ wasm-pack build -- --no-default-features

这将优化最终的WASM，可能需要一些时间。如果您正在开发，并且只关心构建是否通过，则应改用

$ wasm-pack build --dev

如果您出于某种原因想要调用 cargo check 而不是，请调用 cargo check --target wasm32-unknown-unkown。

测试

$ cargo test

如果您想查看测试输出，请在末尾添加 ----nocapture。您还可以通过公开一个兼容 log 的记录器来检查日志。

哲学

请参阅 Philosophy.md。

样式

CSS类名使用连字符命名，仅使用前缀

任何以 wj- 前缀开头的类都是自动生成的，不建议用户直接使用。例如 wj-collapsible-block。
任何以 wiki- 前缀开头的类都是“预制”类。这些类不一定自动生成，但旨在供希望使用标准样式的用户直接使用。例如 wiki-note。

命名

“Foundation Text Markup Language”（ftml）的命名是为了代表宇宙中SCP基金会格式的文件扩展名，如Kate McTiriss的提案中所述。虽然首字母缩写的完整形式从未明确说明，但根据名称与HTML的相似性，可以明显推断出来。

语法

ftml旨在与被认为是“良好格式化”的Wikidot文本的子集兼容。Wikidot的通用语法文档在这里适用，但奇怪的结构或奇怪的功能可能不适用。在开发过程中，它们会被分析，或者明确地不实现，或者通过更合理的语法实现。此外，它还支持一些在Wikidot中不存在的新特性和块，如复选框，并修复了一些错误，如允许可折叠项嵌套。

随着ftml发展成为wikitext的独立分支，这里的页面将单独记录语法，目标是完全弃用Wikidot的文档。

Blocks.md -- ftml中可用的块（例如 [[div]]）以及它们接受的选项。

有一些较少使用或有问题的功能以与Wikidot不同的、不兼容的方式实现。例如

[[include]] 被拆分为 [[include-messy]]（旧行为），以及 [[include-elements]]（自包含元素插入）。
通过在三元链接前缀加上 ! 来实现互链。因此，[[[!wp:Amazon.com | Amazon]]] 而不是 [wp:Amazon.com Amazon]。

使用方法

有几个主要导出函数，对应于 wikitext 处理的每个主要步骤。

首先是 include，它将所有 [[include]] 块替换为其替换页的内容。只要使用所有页面的名称，它就返回替换后的 wikitext 作为新的字符串。它需要一个实现 Includer 的对象，该对象处理检索页面和生成缺失页面消息的过程。

其次是 preprocess，它将执行 Wikidot 的各种文本替换。

第三是 tokenize，它接受输入字符串并返回一个包装类型。如果您想要它产生的标记提取，可以将它转换为 Vec<ExtractedToken<'t>>。这用于 parse 的输入。

然后，借用这些标记的一部分，parse 消费它们并生成一个表示解析 wikitext 全部结构的 SyntaxTree。

最后，使用语法树，您可以使用当时所需的任何 Render 实例 render 它。最有可能的是您想要 HtmlRender。还有一个 TextRender，用于纯文本，例如搜索文章内容或“打印友好”视图。

fn include<'t, I, E>(
    input: &'t str,
    includer: I,
    settings: &WikitextSettings,
) -> Result<(String, Vec<PageRef<'t>>), E>
where
    I: Includer<'t, Error = E>;

fn preprocess(
    text: &mut String,
);

fn tokenize<'t>(
    text: &'t str,
) -> Tokenization<'t>;

fn parse<'r, 't>(
    tokenization: &'r Tokenization<'t>,
) -> ParseResult<SyntaxTree<'t>>;

trait Render {
    type Output;

    fn render(
        &self,
        info: &PageInfo,
        tree: &SyntaxTree,
    ) -> Self::Output;
}

在进行解析时，您需要首先运行 preprocess()，然后在完全展开的文本上运行 parse()。

如果要将结果存储在 struct 中，请考虑每个生成的实体的生命周期。 // Get an `Includer`. // // See trait documentation for what this requires, but // essentially it is some abstract handle that gets the // contents of a page to be included. // // Two sample includers you could try are `NullIncluder` // and `DebugIncluder`. let includer = MyIncluderImpl::new(); // Get our source text let mut input = "**some** test <<string?>>"; // Substitute page inclusions let (mut text, included_pages) = ftml::include(input, includer, &settings); // Perform preprocess substitutions ftml::preprocess(&log, &mut text); // Generate token from input text let tokens = ftml::tokenize(&text); // Parse the token list to produce an AST. // // Note that this produces a `ParseResult<SyntaxTree>`, which records the // parsing warnings in addition to the final result. let result = ftml::parse(&tokens, &page_info, &settings); // Here we extract the tree separately from the warning list. // // Now we have the final AST, as well as all the issues that // occurred during the parsing process. let (tree, warnings) = result.into(); // Finally, we render with our renderer. Generally this is `HtmlRender`, // but you could have a custom implementation here too. // // You must provide a `PageInfo` struct, which describes the page being rendered. // You must also provide a handle to provide various remote sources, such as // module content, but this is not stabilized yet. let html_output = HtmlRender.render(&tree, &page_info, &settings); JSON 序列化请参阅 Serialization.md。

依赖关系 ~9–21MB ~344K SLoC cfg-if entities enum-map getrandom+js wasm32 self_cell 1.0 wasm32 wasm-bindgen+serde-serialize wasm32 web-sys+console wasm32 mathml latex2mathml 日志 maplit once_cell html parcel_css parcel_selectors =0.24.7 pest pest_derive rand+small_rng ref-map regex serde+derive serde-wasm-bindgen 0.6 serde_json serde_repr str-macro 1.0 strum 0.26 strum_macros 0.26 time 0.3+formatting+macros+parsing+serde+serde…readable tinyvec unicase wikidot-normalize build build.rs build built 0.7+chrono+git2 dev proptest dev termcolor