25 个版本
0.4.4 | 2024 年 5 月 2 日 |
---|---|
0.4.1 | 2024 年 4 月 15 日 |
0.4.0 | 2024 年 3 月 18 日 |
0.3.4 | 2023 年 10 月 10 日 |
0.1.10 | 2021 年 11 月 30 日 |
#980 在 异步
1,822 每月下载量
97KB
2K SLoC
gcs-rsync
轻量级且高效的 Rust gcs rsync,用于 Google Cloud Storage。
根据以下基准测试,gcs-rsync 的速度比 gsutil rsync 快。
没有 32K 对象的硬限制或特定配置来计算状态。
此 crate 可以用作库或 CLI。可以独立使用管理对象(下载、上传、删除等)的 API。
如何作为 crate 安装
Cargo.toml
[dependencies]
gcs-rsync = "0.4"
如何作为 CLI 工具安装
cargo install --example gcs-rsync gcs-rsync
~/.cargo/bin/gcs-rsync
如何使用 docker 运行
将本地文件夹镜像到 gcs
docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToUpload>:/source:ro superbeeeeeee/gcs-rsync -r -m /source gs://<YourBucket>/<YourFolderToUpload>/
将 gcs 镜像到文件夹
docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/ /dest
将具有前缀的部分 gcs 镜像到文件夹
docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m gs://<YourBucket>/<YourFolderToUpload>/<YourPrefix> /dest
使用 glob 模式包含或排除文件
CLI gcs-rsync
-i
(包含 glob 模式)和 -x
(排除 glob 模式)可以多次使用。
一个示例,其中递归包含任何 json 或 toml,但递归排除任何 test.json 或 test.toml
docker run --rm -it -v ${GOOGLE_APPLICATION_CREDENTIALS}:/creds.json:ro -v <YourFolderToDownloadTo>:/dest superbeeeeeee/gcs-rsync -r -m -i **/*.json -i **/*.toml -x **/test.json -x **/test.toml
gs://<YourBucket>/YourFolderToUpload>/ /dest
库
with_includes
和 with_excludes
客户端构建器用于填充包含和排除 glob 模式。
基准测试
关于 gsutil 的重要说明:默认情况下,gsutil ls
命令不会列出所有对象项,而是列出所有前缀。而添加 -r
标志会减慢 gsutil
的性能。与 rsync
实现 ls
性能命令非常不同。
仅新文件(第一次同步)
- gcs-rsync:2.2s/7MB
- gsutil:9.93s/47MB
胜者:gcs-rsync
gcs-rsync 同步基准测试
rm -rf ~/Documents/test4 && cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real 2.20
user 0.13
sys 0.21
7606272 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1915 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
394 messages sent
1255 messages received
0 signals received
54 voluntary context switches
5814 involuntary context switches
636241324 instructions retired
989595729 cycles elapsed
3895296 peak memory footprint
gsutil 同步基准测试
rm -rf ~/Documents/gsutil_test4 && mkdir ~/Documents/gsutil_test4 && /usr/bin/time -lp -- gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
Operation completed over 215 objects/50.3 KiB.
real 9.93
user 8.12
sys 2.35
47108096 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
196391 page reclaims
1 page faults
0 swaps
0 block input operations
0 block output operations
36089 messages sent
87309 messages received
5 signals received
38401 voluntary context switches
51924 involuntary context switches
12986389 instructions retired
12032672 cycles elapsed
593920 peak memory footprint
无变化(第二次同步)
- gcs-rsync:0.78s/8MB
- gsutil:2.18s/47MB
胜者:gcs-rsync(由于大小和 mtime 检查,类似于 gsutil)
gcs-rsync 同步基准测试
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/bucket_to_folder_sync
real 1.79
user 0.13
sys 0.12
7864320 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
1980 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
397 messages sent
1247 messages received
0 signals received
42 voluntary context switches
4948 involuntary context switches
435013936 instructions retired
704782682 cycles elapsed
4141056 peak memory footprint
gsutil 同步基准测试
/usr/bin/time -lp -- gsutil -m -q rsync -r gs://test-bucket/sync_test4/ ~/Documents/gsutil_test4/
real 2.18
user 1.37
sys 0.66
46899200 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
100108 page reclaims
1732 page faults
0 swaps
0 block input operations
0 block output operations
6311 messages sent
12752 messages received
4 signals received
6145 voluntary context switches
14219 involuntary context switches
13133297 instructions retired
13313536 cycles elapsed
602112 peak memory footprint
gsutil rsync 配置
gsutil -m -q rsync -r -d ./your-dir gs://your-bucket
/usr/bin/time -lp -- gsutil -m -q rsync -r gs://dev-bucket/sync_test4/ ~/Documents/gsutil_test4/
关于身份验证
所有与身份验证相关的默认功能都使用 GOOGLE_APPLICATION_CREDENTIALS 环境变量作为默认配置,就像官方 Google 库在其它语言(golang、dotnet)中做的那样。
其他功能(from 和 from_file)提供自定义集成模式。
有关OAuth2的更多信息,请参阅oauth2模块中的相关README。
如何运行测试
单元测试
cargo test --lib
集成测试 + 单元测试
TEST_SERVICE_ACCOUNT=<PathToAServiceAccount> TEST_BUCKET=<BUCKET> TEST_PREFIX=<PREFIX> cargo test --no-fail-fast
示例
上传对象
库
use std::path::Path;
use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};
use tokio_util::codec::{BytesCodec, FramedRead};
#[tokio::main]
async fn main() -> StorageResult<()> {
let args = std::env::args().collect::<Vec<_>>();
let bucket = args[1].as_str();
let prefix = args[2].to_owned();
let file_path = args[3].to_owned();
let auc = Box::new(credentials::authorizeduser::default().await?);
let object_client = ObjectClient::new(auc).await?;
let file_path = Path::new(&file_path);
let name = file_path.file_name().unwrap().to_string_lossy();
let file = tokio::fs::File::open(file_path).await.unwrap();
let stream = FramedRead::new(file, BytesCodec::new());
let name = format!("{}/{}", prefix, name);
let object = Object::new(bucket, name.as_str())?;
object_client.upload(&object, stream).await.unwrap();
println!("object {} uploaded", &object);
Ok(())
}
命令行界面(CLI)
cargo run --release --example upload_object "<YourBucket>" "<YourPrefix>" "<YourFilePath>"
下载对象
库
use std::path::Path;
use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};
use tokio::{
fs::File,
io::{AsyncWriteExt, BufWriter},
};
#[tokio::main]
async fn main() -> StorageResult<()> {
let args = std::env::args().collect::<Vec<_>>();
let bucket = args[1].as_str();
let name = args[2].as_str();
let output_path = args[3].to_owned();
let auc = Box::new(credentials::authorizeduser::default().await?);
let object_client = ObjectClient::new(auc).await?;
let file_name = Path::new(&name).file_name().unwrap().to_string_lossy();
let file_path = format!("{}/{}", output_path, file_name);
let object = Object::new(bucket, name)?;
let mut stream = object_client.download(&object).await.unwrap();
let file = File::create(&file_path).await.unwrap();
let mut buf_writer = BufWriter::new(file);
while let Some(data) = stream.try_next().await.unwrap() {
buf_writer.write_all(&data).await.unwrap();
}
buf_writer.flush().await.unwrap();
println!("object {} downloaded to {:?}", &object, file_path);
Ok(())
}
命令行界面(CLI)
cargo run --release --example download_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"
下载公开对象
库
use std::path::Path;
use futures::TryStreamExt;
use gcs_rsync::storage::{Object, ObjectClient, StorageResult};
use tokio::{
fs::File,
io::{AsyncWriteExt, BufWriter},
};
#[tokio::main]
async fn main() -> StorageResult<()> {
let bucket = "gcs-rsync-dev-public";
let name = "hello.txt";
let object_client = ObjectClient::no_auth();
let file_name = Path::new(&name).file_name().unwrap().to_string_lossy();
let file_path = file_name.to_string();
let object = Object::new(bucket, "hello.txt")?;
let mut stream = object_client.download(&object).await.unwrap();
let file = File::create(&file_path).await.unwrap();
let mut buf_writer = BufWriter::new(file);
while let Some(data) = stream.try_next().await.unwrap() {
buf_writer.write_all(&data).await.unwrap();
}
buf_writer.flush().await.unwrap();
println!("object {} downloaded to {:?}", &object, file_path);
Ok(())
}
命令行界面(CLI)
cargo run --release --example download_public_object "<YourBucket>" "<YourObjectName>" "<YourAbsoluteExistingDirectory>"
删除对象
库
use gcs_rsync::storage::{credentials, Object, ObjectClient, StorageResult};
#[tokio::main]
async fn main() -> StorageResult<()> {
let args = std::env::args().collect::<Vec<_>>();
let bucket = args[1].as_str();
let name = args[2].as_str();
let object = Object::new(bucket, name)?;
let auc = Box::new(credentials::authorizeduser::default().await?);
let object_client = ObjectClient::new(auc).await?;
object_client.delete(&object).await?;
println!("object {} uploaded", &object);
Ok(())
}
命令行界面(CLI)
cargo run --release --example delete_object "<YourBucket>" "<YourPrefix>/<YourFileName>"
列出对象
库
use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, ObjectClient, ObjectsListRequest, StorageResult};
#[tokio::main]
async fn main() -> StorageResult<()> {
let args = std::env::args().collect::<Vec<_>>();
let bucket = args[1].as_str();
let prefix = args[2].to_owned();
let auc = Box::new(credentials::authorizeduser::default().await?);
let object_client = ObjectClient::new(auc).await?;
let objects_list_request = ObjectsListRequest {
prefix: Some(prefix),
fields: Some("items(name),nextPageToken".to_owned()),
..Default::default()
};
object_client
.list(bucket, &objects_list_request)
.await
.try_for_each(|x| {
println!("{}", x.name.unwrap());
futures::future::ok(())
})
.await?;
Ok(())
}
命令行界面(CLI)
cargo run --release --example list_objects "<YourBucket>" "<YourPrefix>"
使用默认服务账户列出对象
库
use futures::TryStreamExt;
use gcs_rsync::storage::{credentials, ObjectClient, ObjectsListRequest, StorageResult};
#[tokio::main]
async fn main() -> StorageResult<()> {
let args = std::env::args().collect::<Vec<_>>();
let bucket = args[1].as_str();
let prefix = args[2].to_owned();
let auc = Box::new(
credentials::serviceaccount::default(
"https://www.googleapis.com/auth/devstorage.full_control",
)
.await?,
);
let object_client = ObjectClient::new(auc).await?;
let objects_list_request = ObjectsListRequest {
prefix: Some(prefix),
fields: Some("items(name),nextPageToken".to_owned()),
..Default::default()
};
object_client
.list(bucket, &objects_list_request)
.await
.try_for_each(|x| {
println!("{}", x.name.unwrap());
futures::future::ok(())
})
.await?;
Ok(())
}
命令行界面(CLI)
GOOGLE_APPLICATION_CREDENTIALS=<PathToJson> cargo r --release --example list_objects_service_account "<YourBucket>" "<YourPrefix>"
列出大量(>32K)对象
列出包含超过60K个对象的存储桶
time cargo run --release --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>" | wc -l
性能分析
export CARGO_PROFILE_RELEASE_DEBUG=true
sudo -- cargo flamegraph --example list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
cargo build --release --examples && /usr/bin/time -lp -- ./target/release/examples/list_objects "<YourBucket>" "<YourPrefixHavingMoreThan60K>"
原生二进制构建(静态共享库)
docker rust rust:alpine3.14
apk add --no-cache musl-dev pkgconfig openssl-dev
LDFLAGS="-static -L/usr/local/musl/lib" LD_LIBRARY_PATH=/usr/local/musl/lib:$LD_LIBRARY_PATH CFLAGS="-I/usr/local/musl/include" PKG_CONFIG_PATH=/usr/local/musl/lib/pkgconfig cargo build --release --target=x86_64-unknown-linux-musl --example bucket_to_folder_sync
依赖项
~10–24MB
~367K SLoC