hudi-core — 服务器端 Rust // Lib.rs

3 个版本

0.1.0	2024 年 7 月 15 日
0.1.0-rc.2	2024 年 7 月 12 日
0.1.0-rc.1	2024 年 7 月 11 日
0.1.0-alpha2	~~2024 年 7 月 10 日~~

#957 在 HTTP 服务器

每月 209 次下载
在 2 crates 中使用

Apache-2.0

95KB
2K SLoC

Apache Hudi 的原生 Rust 库，具有对 Python 的绑定

hudi-rs 项目旨在扩大 Apache Hudi 在各种用户和项目中的使用。

源代码	安装命令
PyPi	`pip 安装 hudi`
Crates.io	`cargo添加 hudi`

示例用法

Python

将 Hudi 表读取到 PyArrow 表中。

from hudi import HudiTable

hudi_table = HudiTable("/tmp/trips_table")
records = hudi_table.read_snapshot()

import pyarrow as pa
import pyarrow.compute as pc

arrow_table = pa.Table.from_batches(records)
result = arrow_table.select(
    ["rider", "ts", "fare"]).filter(
    pc.field("fare") > 20.0)
print(result)

Rust

将具有 `datafusion` 功能的 crate `hudi` 添加到您的应用程序中，以查询 Hudi 表。

[dependencies]
hudi = { version = "0" , features = ["datafusion"] }
tokio = "1"
datafusion = "39.0.0"

use std::sync::Arc;

use datafusion::error::Result;
use datafusion::prelude::{DataFrame, SessionContext};
use hudi::HudiDataSource;

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();
    let hudi = HudiDataSource::new("/tmp/trips_table").await?;
    ctx.register_table("trips_table", Arc::new(hudi))?;
    let df: DataFrame = ctx.sql("SELECT * from trips_table where fare > 20.0").await?;
    df.show().await?;
    Ok(())
}

处理云存储

请确保将云存储凭据正确设置为环境变量，例如 AWS_*、AZURE_* 或 GOOGLE_*。然后，将自动拾取相关的存储环境变量。目标表的基本 URI 将根据需要使用类似于 s3://、az:// 或 gs:// 的方案进行处理。

贡献有关向项目贡献的所有详细信息，请参阅贡献指南。

依赖项 ~0–15MB ~210K SLoC anyhow arrow 52.0+pyarrow arrow-arith 52.0 arrow-array 52.0 arrow-buffer 52.0 arrow-cast 52.0 arrow-ipc 52.0 arrow-json arrow-ord 52.0 arrow-row 52.0 arrow-schema 52.0+serde arrow-select 52.0 async-recursion bytes dashmap 6.0 futures object_store 0.10.1+aws+azure+gcp parquet 52.0+async+object_store serde+derive serde_json strum 0.26.3+derive strum_macros 0.26.4 tokio+rt-multi-thread url