#google-cloud #cloud-storage #rsync #gc #cloud #google

gcs-rsync

比 gsutil rsync 更高性能的 gcs rsync 支持

25 个版本

0.4.4 2024 年 5 月 2 日
0.4.1 2024 年 4 月 15 日
0.4.0 2024 年 3 月 18 日
0.3.4 2023 年 10 月 10 日
0.1.10 2021 年 11 月 30 日

#980异步

Download history 754/week @ 2024-05-02 680/week @ 2024-05-09 413/week @ 2024-05-16 495/week @ 2024-05-23 769/week @ 2024-05-30 498/week @ 2024-06-06 429/week @ 2024-06-13 247/week @ 2024-06-20 275/week @ 2024-06-27 103/week @ 2024-07-04 264/week @ 2024-07-11 244/week @ 2024-07-18 671/week @ 2024-07-25 559/week @ 2024-08-01 259/week @ 2024-08-08 231/week @ 2024-08-15

1,822 每月下载量

MIT 许可证

97KB
2K SLoC

gcs-rsync

build codecov License:MIT docs.rs crates.io crates.io (recent) docker

轻量级且高效的 Rust gcs rsync,用于 Google Cloud Storage。

根据以下基准测试,gcs-rsync 的速度比 gsutil rsync 快。

没有 32K 对象的硬限制或特定配置来计算状态。

此 crate 可以用作库或 CLI。可以独立使用管理对象(下载、上传、删除等)的 API。

如何作为 crate 安装

Cargo.toml

[dependencies]
gcs-rsync = "0.4"

如何作为 CLI 工具安装

cargo install --example gcs-rsync gcs-rsync

~/.cargo/bin/gcs-rsync

如何使用 docker 运行

将本地文件夹镜像到 gcs

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToUpload>:/source:ro superbeeeeeee/gcs-rsync -r -m /source gs://<YourBucket>/<YourFolderToUpload>/

将 gcs 镜像到文件夹

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/ /dest

将具有前缀的部分 gcs 镜像到文件夹

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/<YourPrefix> /dest

使用 glob 模式包含或排除文件

CLI gcs-rsync

-i(包含 glob 模式)和 -x(排除 glob 模式)可以多次使用。

一个示例,其中递归包含任何 json 或 toml,但递归排除任何 test.json 或 test.toml

docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m -i **/*.json -i **/*.toml -x **/test.json -x **/test.toml
 gs://<YourBucket>/YourFolderToUpload>/ /dest

with_includeswith_excludes 客户端构建器用于填充包含和排除 glob 模式。

基准测试

关于 gsutil 的重要说明:默认情况下,gsutil ls 命令不会列出所有对象项,而是列出所有前缀。而添加 -r 标志会减慢 gsutil 的性能。与 rsync 实现 ls 性能命令非常不同。

仅新文件(第一次同步)

  • gcs-rsync:2.2s/7MB
  • gsutil:9.93s/47MB

胜者:gcs-rsync

gcs-rsync 同步基准测试

rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         2.20
user         0.13
sys          0.21
             7606272  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1915  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 394  messages sent
                1255  messages received
                   0  signals received
                  54  voluntary context switches
                5814  involuntary context switches
           636241324  instructions retired
           989595729  cycles elapsed
             3895296  peak memory footprint

gsutil 同步基准测试

rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
Operation completed over 215 objects/50.3 KiB.
real         9.93
user         8.12
sys          2.35
            47108096  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              196391  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
               36089  messages sent
               87309  messages received
                   5  signals received
               38401  voluntary context switches
               51924  involuntary context switches
            12986389  instructions retired
            12032672  cycles elapsed
              593920  peak memory footprint

无变化(第二次同步)

  • gcs-rsync:0.78s/8MB
  • gsutil:2.18s/47MB

胜者:gcs-rsync(由于大小和 mtime 检查,类似于 gsutil)

gcs-rsync 同步基准测试

cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real         1.79
user         0.13
sys          0.12
             7864320  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
                1980  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 397  messages sent
                1247  messages received
                   0  signals received
                  42  voluntary context switches
                4948  involuntary context switches
           435013936  instructions retired
           704782682  cycles elapsed
             4141056  peak memory footprint

gsutil 同步基准测试

/usr/bin/time -lp --  gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/
real         2.18
user         1.37
sys          0.66
            46899200  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              100108  page reclaims
                1732  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                6311  messages sent
               12752  messages received
                   4  signals received
                6145  voluntary context switches
               14219  involuntary context switches
            13133297  instructions retired
            13313536  cycles elapsed
              602112  peak memory footprint

gsutil rsync 配置

gsutil -m -q rsync -r -d ./your-dir gs://your-bucket
/usr/bin/time -lp --  gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/

关于身份验证

所有与身份验证相关的默认功能都使用 GOOGLE_APPLICATION_CREDENTIALS 环境变量作为默认配置,就像官方 Google 库在其它语言(golang、dotnet)中做的那样。

其他功能(from 和 from_file)提供自定义集成模式。

有关OAuth2的更多信息,请参阅oauth2模块中的相关README。

如何运行测试

单元测试

cargo test --lib

集成测试 + 单元测试

TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast

示例

上传对象

use std::path::Path;

use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};
use tokio_util::codec::{BytesCodec, FramedRead};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let bucket = args[1].as_str();
    let prefix = args[2].to_owned();
    let file_path = args[3].to_owned();

    let auc = Box::new(credentials::authorizeduser::default().await?);
    let object_client = ObjectClient::new(auc).await?;

    let file_path = Path::new(&file_path);
    let name = file_path.file_name().unwrap().to_string_lossy();

    let file = tokio::fs::File::open(file_path).await.unwrap();
    let stream = FramedRead::new(file, BytesCodec::new());

    let name = format!("{}/{}", prefix, name);
    let object = Object::new(bucket, name.as_str())?;
    object_client.upload(&object, stream).await.unwrap();
    println!("object {} uploaded", &object);
    Ok(())
}

命令行界面(CLI)

cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"

下载对象

use std::path::Path;

use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};
use tokio::{
    fs::File,
    io::{AsyncWriteExt, BufWriter},
};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let bucket = args[1].as_str();
    let name = args[2].as_str();
    let output_path = args[3].to_owned();

    let auc = Box::new(credentials::authorizeduser::default().await?);
    let object_client = ObjectClient::new(auc).await?;

    let file_name = Path::new(&name).file_name().unwrap().to_string_lossy();
    let file_path = format!("{}/{}", output_path, file_name);

    let object = Object::new(bucket, name)?;
    let mut stream = object_client.download(&object).await.unwrap();

    let file = File::create(&file_path).await.unwrap();
    let mut buf_writer = BufWriter::new(file);

    while let Some(data) = stream.try_next().await.unwrap() {
        buf_writer.write_all(&data).await.unwrap();
    }

    buf_writer.flush().await.unwrap();
    println!("object {} downloaded to {:?}", &object, file_path);
    Ok(())
}

命令行界面(CLI)

cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

下载公开对象

use std::path::Path;

use futures::TryStreamExt;
use gcs_rsync::storage::{Object, ObjectClient, StorageResult};
use tokio::{
    fs::File,
    io::{AsyncWriteExt, BufWriter},
};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let bucket = "gcs-rsync-dev-public";
    let name = "hello.txt";

    let object_client = ObjectClient::no_auth();

    let file_name = Path::new(&name).file_name().unwrap().to_string_lossy();
    let file_path = file_name.to_string();

    let object = Object::new(bucket, "hello.txt")?;
    let mut stream = object_client.download(&object).await.unwrap();

    let file = File::create(&file_path).await.unwrap();
    let mut buf_writer = BufWriter::new(file);

    while let Some(data) = stream.try_next().await.unwrap() {
        buf_writer.write_all(&data).await.unwrap();
    }

    buf_writer.flush().await.unwrap();
    println!("object {} downloaded to {:?}", &object, file_path);
    Ok(())
}

命令行界面(CLI)

cargo run --release --example download_public_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"

删除对象

use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let bucket = args[1].as_str();
    let name = args[2].as_str();
    let object = Object::new(bucket, name)?;

    let auc = Box::new(credentials::authorizeduser::default().await?);
    let object_client = ObjectClient::new(auc).await?;

    object_client.delete(&object).await?;
    println!("object {} uploaded", &object);
    Ok(())
}

命令行界面(CLI)

cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"

列出对象

use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, ObjectClient, ObjectsListRequest, StorageResult};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let bucket = args[1].as_str();
    let prefix = args[2].to_owned();

    let auc = Box::new(credentials::authorizeduser::default().await?);
    let object_client = ObjectClient::new(auc).await?;

    let objects_list_request = ObjectsListRequest {
        prefix: Some(prefix),
        fields: Some("items(name),nextPageToken".to_owned()),
        ..Default::default()
    };

    object_client
        .list(bucket, &objects_list_request)
        .await
        .try_for_each(|x| {
            println!("{}", x.name.unwrap());
            futures::future::ok(())
        })
        .await?;

    Ok(())
}

命令行界面(CLI)

cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"

使用默认服务账户列出对象

use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, ObjectClient, ObjectsListRequest, StorageResult};

#[tokio::main]
async fn main() -> StorageResult<()> {
    let args = std::env::args().collect::<Vec<_>>();
    let bucket = args[1].as_str();
    let prefix = args[2].to_owned();

    let auc = Box::new(
        credentials::serviceaccount::default(
            "https://www.googleapis.com/auth/devstorage.full_control",
        )
        .await?,
    );
    let object_client = ObjectClient::new(auc).await?;

    let objects_list_request = ObjectsListRequest {
        prefix: Some(prefix),
        fields: Some("items(name),nextPageToken".to_owned()),
        ..Default::default()
    };

    object_client
        .list(bucket, &objects_list_request)
        .await
        .try_for_each(|x| {
            println!("{}", x.name.unwrap());
            futures::future::ok(())
        })
        .await?;

    Ok(())
}

命令行界面(CLI)

GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"

列出大量(>32K)对象

列出包含超过60K个对象的存储桶

time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l

性能分析

人类在猜测性能方面很糟糕

export CARGO_PROFILE_RELEASE_DEBUG=true
sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"

原生二进制构建(静态共享库)

docker rust rust:alpine3.14
apk add --no-cache musl-dev pkgconfig openssl-dev

LDFLAGS="-static -L/usr/local/musl/lib" LD_LIBRARY_PATH=/usr/local/musl/lib:$LD_LIBRARY_PATH CFLAGS="-I/usr/local/musl/include" PKG_CONFIG_PATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x86_64-unknown-linux-musl --example bucket_to_folder_sync

依赖项

~10–24MB
~367K SLoC