2 个稳定版本

1.0.1	2023 年 10 月 18 日
1.0.0	2023 年 10 月 17 日

#1253 in WebAssembly

Apache-2.0 WITH LLVM-exception

130KB
2K SLoC

Winliner

WebAssembly 间接调用内联器！

API 文档 | 贡献

关于
安装
示例用法
注意事项
将 Winliner 作为库使用
致谢

关于

Winliner 根据之前的分析阶段观察到的信息，在 WebAssembly 中推测性地内联间接调用。这是一种我们亲切地称之为 winlining 的分析指导优化。

首先，Winliner 在你的 Wasm 程序中的每个间接调用站点插入工具，以观察实际的调用目标。然后，你运行一段时间经过工具的程序，建立分析。最后，你再次调用 Winliner，这次提供它记录的分析，并根据该分析优化你的 Wasm 程序。

例如，如果分析显示间接调用始终（或几乎始终）指向 funcrefs 表中的第 42 个条目，那么 Winliner 将执行以下语义透明的转换

;; Before:

call_indirect

;; After:

;; If the callee index is 42, execute the inlined body of
;; the associated function.
local.tee $temp
i32.const 42
i32.eq
if
  <inlined body of table[42] here>
else
  local.get $temp
  call_indirect
end

这种推测性内联本身通常不是巨大的性能提升，因为 CPU 间接分支预测非常强大（尽管，根据 Wasm 引擎，进入新函数可能需要一些成本，内联可以避免这种情况）。主要好处是它允许 Wasm 编译器“看透”间接调用并对内联调用的主体执行后续优化（如 GVN 和 LICM），这可能带来显著的性能提升。

这种技术与 devirtualization 类似，但不需要编译器能够静态地确定调用者，也不需要调用者始终是100%的单个特定函数。与 devirtualization 不同，Winlining 还可以优化99%的时间走某条路径和1%的时间走不同路径的间接调用，因为它始终可以回退到未优化的间接调用。

安装

您可以通过 cargo 进行安装

$ cargo install winliner --all-features

示例用法

首先，对你的 Wasm 程序进行工具化

$ winliner instrument my-program.wasm > my-program.instrumented.wasm

接下来，运行带有调试信息的程序来构建配置文件。这可以通过您选择的 Wasm 环境（例如 Web）完成，并使用少量粘合代码来提取和导出配置文件，或者您可以在 Winliner 本身及其附带的基于 Wasmtime 的 WASI 环境中运行。

$ winliner profile my-program.instrumented.wasm > profile.json

最后，告诉 Winliner 根据观察到的配置文件中的 call_indirect 行为来优化原始程序。

$ winliner optimize --profile profile.json my-program.wasm > my-program.winlined.wasm

注意事项

在 funcref 表发生突变的情况下，Winliner 并不安全，这可以通过 table.set 指令（以及其他）来实现，这些指令是作为引用类型提议的一部分引入的。您必须禁用此提议或手动维护不变性，即 funcref 表永远不会发生突变。违反此不变性可能会导致与原始程序的行为差异很大，并产生非常奇怪的错误！任何导出的 funcref 表还必须由宿主程序防止发生突变。
Winliner 只优化 call_indirect 指令；它不能优化 call_ref 指令，因为 WebAssembly 函数引用是不可比较的，所以我们不能插入 if actual_callee == speculative_callee 检查。
Winliner 假设其生成的代码支持（广泛实现）的多值提议。

将 Winliner 作为库使用

首先，将 Winliner 依赖项添加到您的 Cargo.toml 文件中。

[dependencies]
winliner = "1"

然后，像这样使用库：

use winliner::{InstrumentationStrategy, Instrumenter, Optimizer, Profile, Result};

fn main() -> Result<()> {
    let original_wasm = std::fs::read("path/to/my.wasm")?;

    // Configure instrumentation.
    let mut instrumenter = Instrumenter::new();
    instrumenter.strategy(InstrumentationStrategy::ThreeGlobals);

    // Instrument our wasm.
    let instrumented_wasm = instrumenter.instrument(&original_wasm)?;

    // Get a profile for our Wasm program from somewhere. Read it from disk,
    // record it now in this process, etc...
    //
    // See the API docs for `Profile` for more details.
    let profile = Profile::default();

    // Configure optimization and thresholds for inlining.
    let mut optimizer = Optimizer::new();
    optimizer
        .min_total_calls(100)
        .min_ratio(0.8)?
        .max_inline_depth(3);

    // Run the optimizer with the given profile!
    let optimized_wasm = optimizer.optimize(&profile, &original_wasm)?;

    std::fs::write("path/to/optimized.wasm", optimized_wasm)?;
    Ok(())
}

致谢

这个工具的灵感——以及低开销但不够精确的“三个全局变量”调试策略——源于与 Chris Fallin 和 Luke Wagner 的交谈。

依赖关系

~2–18MB
~211K SLoC