0.5.0 |
|
---|
#53 in #object-store
365KB
7.5K SLoC
Rust对象存储
lib.rs
:
object_store
该crate提供了一个统一的API,通过ObjectStore
trait与对象存储服务和本地文件进行交互。
创建一个ObjectStore
实现
- Google Cloud Storage:
GoogleCloudStorageBuilder
- Amazon S3:
AmazonS3Builder
- Azure Blob Storage:
MicrosoftAzureBuilder
- 内存中:
InMemory
- 本地文件系统:
LocalFileSystem
适配器
ObjectStore
实例可以与各种适配器组合,以添加额外的功能
- 速率限制:
ThrottleConfig
- 并发请求限制:
LimitStore
列出对象
使用ObjectStore::list
方法遍历远程存储中的对象或本地文件系统中的文件
use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;
// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
// Recursively list all files below the 'data' path.
// 1. On AWS S3 this would be the 'data/' prefix
// 2. On a local filesystem, this would be the 'data' directory
let prefix: Path = "data".try_into().unwrap();
// Get an `async` stream of Metadata objects:
let list_stream = object_store
.list(Some(&prefix))
.await
.expect("Error listing files");
// Print a line about each object based on its metadata
// using for_each from `StreamExt` trait.
list_stream
.for_each(move |meta| {
async {
let meta = meta.expect("Error listing");
println!("Name: {}, size: {}", meta.location, meta.size);
}
})
.await;
将打印出类似以下内容
Name: data/file01.parquet, size: 112832
Name: data/file02.parquet, size: 143119
Name: data/child/file03.parquet, size: 100
...
获取对象
使用ObjectStore::get
方法从远程存储或本地文件系统中的文件获取数据字节作为流。
use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;
// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());
// Retrieve a specific file
let path: Path = "data/file01.parquet".try_into().unwrap();
// fetch the bytes from object store
let stream = object_store
.get(&path)
.await
.unwrap()
.into_stream();
// Count the '0's using `map` from `StreamExt` trait
let num_zeros = stream
.map(|bytes| {
let bytes = bytes.unwrap();
bytes.iter().filter(|b| **b == 0).count()
})
.collect::<Vec<usize>>()
.await
.into_iter()
.sum::<usize>();
println!("Num zeros in {} is {}", path, num_zeros);
将打印出类似以下内容
Num zeros in data/file01.parquet is 657
依赖项
~6–24MB
~339K SLoC