17 个重大版本发布

新版本 0.18.1	2024年8月19日
0.17.0	2024年7月10日
0.14.1	2024年3月28日

#960 在 WebAssembly

1,255 每月下载量

Apache-2.0 WITH LLVM-exception

34KB

WASI Blob 存储

一个提议的 WebAssembly 系统接口 API。

当前阶段

第 1 阶段

倡导者

周佳晓
霍夫曼
大卫·贾斯汀
查理·丹尼尔
泰勒·托马斯

第 4 阶段进展标准

至少有两个独立的生产实现。
至少有两个云服务提供商的实现。
至少在 Windows、Linux 和 MacOS 上可用。
上述平台和实现上通过测试的测试套件。

在本提案的上下文中，对象存储指的是授予 WebAssembly 组件对 blob 存储的共同抽象的访问权限。对象存储服务的示例包括 Azure Blob Storage、AWS S3 或 Google Cloud Storage，但可以是任何可以表示为符合接口的非结构化二进制数据的任何事物，包括文件系统。

目标

此 API 的主要目标是提供对对象存储服务的共同抽象，以便 WebAssembly 组件可以编写与任何实现一起工作，而无需了解底层服务的详细信息。

此外，使用此 API 的组件将无法区分对象存储服务和文件系统，这允许它们编写与任一服务一起工作，并且不需要在组件代码中配置存储。

非目标

以下是一个列出此 API 规范明确不包括的目标的列表

覆盖所有边缘案例和利基场景
服务访问配置
密钥管理
网络协议的定义或直接使用
监控和可观察性

API 漫游

以下部分提供了如何使用此API的概述。请注意，虽然示例代码使用Rust编写，但任何可以通过代码生成针对wasm组件的语言都应该可以工作。

处理 Blob 内容

此示例展示了如何获取容器的引用以及该容器中的所需对象，然后通过循环使用read_into来访问blob内容。

// Count the number of lines in an object
// For simplicity, assume the object contains ascii text and lines end in '\n'
fn count_lines(store: &impl BlobStore, id: &ObjectId) -> Result<usize, Error> {
  let mut stream = store.get_container(&id.container_name)?.read_object(&id.object_name)?;
  let mut buf = [0u8; 4096];
  let mut num_lines = 0;
  while let Some(bytes) = stream.read_into(&mut buf)? {
    num_lines += buf[0..bytes as usize].iter().filter(|&c| *c == b'\n').count();
  }
  Ok(num_lines)
}

写入 Blob 流

以下代码示例展示了如何获取容器的引用以及一个将要存储在blob中的可写流的引用。

// Download a file from an http url and save it to the blob store.
// When completed, returns metadata for the new object
fn download(url: &str, store: &impl BlobStore, id: &ObjectId) -> Result<ObjectMetadata, Error> {
    let container = store.get_container(&id.container_name)?;
    // retrieve a url via wasi-http fetch() method
    // the http service hasn't been defined yet, but assume its fetch() method returns a readable stream.
    let mut download_stream = http::fetch(url)?;
    let mut buf = [0u8; 4096];
    let mut save_stream = container.write_object(&id.object_name)?;
    while let Some(bytes) = download_stream.read_into(&mut buf)? {
        save_stream.write(&buf[0..bytes as usize])?;
    }
    // ensure stream is flushed and object is created, before we query the metadata
    save_stream.close()?;
    let obj = container.object_info(&id.object_name)?;
    Ok(obj)
}

列出容器内的对象

以下代码展示了如何在容器内枚举对象。

// suppose the "logs" container has objects with names that start with a timestamp, like "2022-01-01-12-00-00.log"
// for every day that activity occurred. To count the number of logs from january 2022, call:
//    `count_objects_with_prefix(store, "logs", "2022-01")`
fn count_objects_with_prefix(store: &impl BlobStore, container_name: &str, prefix: &str) -> Result<usize,Error> {
  let container = store.get_container(container_name)?;
  let names = container.list_objects()?;
  let count = names.filter(|n| n.starts_with(prefix)).count();
  Ok(count)
}

详细设计讨论

请参阅wit文件

处理大文件

处理大文件可能需要对API进行修改，而这些修改并未包含在此当前的提案中。如果一个组件试图分配比主机愿意提供的更多的内存，那么组件可能会被主机运行时终止，并且处理将失败。

此外，如果一个组件花费太长时间处理文件，无论是通过处理一个大型blob还是通过在紧密循环中处理许多小blob，那么该组件可能会再次被关闭，因为它消耗了过多的资源或时间。

如何处理这个问题，以及是否将callback方法放入blob存储API或更低级的wasm-io API仍在讨论中。

利益相关者兴趣和反馈

在进入第3阶段之前，需要做TODO。

依赖关系

~6-8.5MB
~152K SLoC

wrpc-interface-blobstore