#object-store #google-cloud #local #amazon-s3 #object-storage #azure #cloud-storage

yanked object_store-fork

一个通用的对象存储接口,用于统一与AWS S3、Google Cloud Storage、Azure Blob Storage和本地文件交互

0.5.0 2022年9月8日

#53 in #object-store

MIT/Apache

365KB
7.5K SLoC

Rust对象存储


lib.rs:

object_store

该crate提供了一个统一的API,通过ObjectStore trait与对象存储服务和本地文件进行交互。

创建一个ObjectStore实现

适配器

ObjectStore实例可以与各种适配器组合,以添加额外的功能

列出对象

使用ObjectStore::list方法遍历远程存储中的对象或本地文件系统中的文件


use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;

// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());

// Recursively list all files below the 'data' path.
// 1. On AWS S3 this would be the 'data/' prefix
// 2. On a local filesystem, this would be the 'data' directory
let prefix: Path = "data".try_into().unwrap();

// Get an `async` stream of Metadata objects:
 let list_stream = object_store
     .list(Some(&prefix))
     .await
     .expect("Error listing files");

 // Print a line about each object based on its metadata
 // using for_each from `StreamExt` trait.
 list_stream
     .for_each(move |meta|  {
         async {
             let meta = meta.expect("Error listing");
             println!("Name: {}, size: {}", meta.location, meta.size);
         }
     })
     .await;

将打印出类似以下内容

Name: data/file01.parquet, size: 112832
Name: data/file02.parquet, size: 143119
Name: data/child/file03.parquet, size: 100
...

获取对象

使用ObjectStore::get方法从远程存储或本地文件系统中的文件获取数据字节作为流。


use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;

// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());

// Retrieve a specific file
let path: Path = "data/file01.parquet".try_into().unwrap();

// fetch the bytes from object store
let stream = object_store
    .get(&path)
    .await
    .unwrap()
    .into_stream();

// Count the '0's using `map` from `StreamExt` trait
let num_zeros = stream
    .map(|bytes| {
        let bytes = bytes.unwrap();
       bytes.iter().filter(|b| **b == 0).count()
    })
    .collect::<Vec<usize>>()
    .await
    .into_iter()
    .sum::<usize>();

println!("Num zeros in {} is {}", path, num_zeros);

将打印出类似以下内容

Num zeros in data/file01.parquet is 657

依赖项

~6–24MB
~339K SLoC