15 个版本

0.6.0-beta.12024年1月25日
0.5.2 2023年12月22日
0.5.1 2022年6月16日
0.4.0 2022年3月24日

#282 in 文本处理

43 每月下载量
用于 2 crates

MIT/Apache

46KB
983

Cindex,CSV 索引器

Cindex 是一个易于使用的 CSV 索引器,支持简单的类似 SQL 的查询。

Cindex 不适用于重量级的数据库索引,而适用于简单的内存查询。如果您正在使用大量 CSV 文件,请使用其他数据库交互层。

即将添加二进制文件。

变更

功能

使用类似 SQL 的查询索引 CSV 表

有时您可能希望以原始方式索引表并从中获取原始值。只需将 CSV 表通过管道传递到其他程序,可能还有可选的 CSV 标头。

人性化的 CSV 读取

手动写入 CSV 值并不那么简单,但在某些情况下您必须这样做。cindex 允许缺失值,甚至可以使用特定的 FLAG 语法允许缺失列。不允许缺失逗号。

使用方法

[dependencies]
cindex = "*" # Use the latest version if possible

# Use rayon feature if you want parrelel iteration of rows
features = ["rayon"]
use std::fs::File;
use cindex::{Indexer, CsvType, Predicate, Query, OutOption, Operator};

let mut indexer = Indexer::new();

// Add table from file
indexer
    .add_table(
        "table1",
        BufReader::new(File::open("test.csv").expect("Failed to open a file")),
    )
    .expect("Failed to add table");

// Add table from stdin
let stdin = std::io::stdin();
indexer
    .add_table("table2", stdin.lock())
    .expect("Failed to add table");

// Indexing

// Create query object and print queried output to terminal
use std::str::FromStr;
let query = Query::from_str("SELECT a,b,c FROM table1 WHERE a = 10")
    .expect("Failed to create a query from str");
indexer
    .index(query, OutOption::Term)
    .expect("Failed to index a table");

// Use raw query and yield output to a file
indexer
    .index_raw(
        "SELECT * FROM table2 WHERE id = 10",
        OutOption::File(std::fs::File::create("out.csv").expect("Failed to create a file")),
    )
    .expect("Failed to index a table");

// Use builder pattern to construct query and index a table
let query = Query::build()
	.table("table1")
    .columns(vec!["id", "address"])
    .predicate(Predicate::new("id", Operator::Equal).args(vec!["10"]))
    .predicate(
        Predicate::build()
            .column("address")
            .operator(Operator::NotEqual)
            .raw_args("111-2222"),
    );

let mut acc = String::new();
indexer
    .index(query, OutOption::Value(&mut acc))
    .expect("Failed to index a table");

// Always use unix newline for formatting
indexer.always_use_unix_newline(true);

查询语法

Cindex 的查询语法类似于 SQL,但有一些细微差别。

WHERE 子句的比较符应在列名之后

/* Select everythig from given table*/
SELECT * FROM table1

/* Select everything from given table and order by column with descending
order*/
SELECT * FROM table1 ORDER BY col1 DESC

/* Same with previous commands but map header to different array */
SELECT * FROM table1 ORDER BY col1 DESC HMAP 'new h','new h2','new h3'

/* You can use OFFSET and LIMIT syntax to control how much lines to print*/
/* Keep in mind that this doesn't early return from indexing, but it works as
   final_records[offset..offset+limit] */
/* e.g. next line gets records[1..3] */
SELECT * FROM table1 OFFSET 1 LIMIT 2

/* Select given columns from table where column's value is equal to given
condition and also other column's value matches regex expression */
SELECT col1,col2 FROM table1 WHERE col1 = 10 AND col2 LIKE ^start

/* There is a flag syntax which changes query behaviour*/
SELECT * FROM table_name FLAG PHD SUP

/* In this case each flag does next operation
  - PHD (PRINT-HEADER) : Print a header in result output
  - SUP (SUPPLEMENT)   : Enable a selection of non-existent column with empty values
  - TP  (Transpose)    : Transpose(Invert) csv value as of linalg
 */

支持的 WHERE 操作包括

 >= 
 >
 <=
 <
 =
 !=
 IN
 BETWEEN
 LIKE ( with regeular expression )

待办事项

  • 支持多个 WHERE 子句
  • 连接表

依赖关系

~6MB
~62K SLoC