3个版本
0.4.7 | 2023年1月30日 |
---|---|
0.4.6 | 2023年1月28日 |
0.4.5 | 2023年1月27日 |
#2624 在 命令行实用工具
每月22 次下载
2.5MB
5.5K SLoC
包含(ZIP文件,1.5MB)tests/data/hotel_reservation.xlsx,(ZIP文件,9KB)tests/data/empty.xlsx
用Rust编写的csv、excel工具包
rsv 是一个命令行工具,用于处理小型和大型的CSV、TXT、EXCEL文件(尤其是>10G)。rsv具有以下特性
- 用Rust编写
- 快速并行数据处理(基于Rayon)
- 实时进度条
- 使用简单
- 支持命令管道
用法
从发布选项卡下载 rsv.exe,并将文件目录添加到系统路径中。
可用命令
- head - 显示CSV、TXT或EXCEL文件的头部n行。
- header - 显示文件头部。
- count - 计算文件行数 🏃。
- estimate - 快速估计行数。
- clean - 使用转义字符(例如”)或其他字符串清理文件 🏃。
- frequency - 显示列(s)的频率表 🏃 ⭐。
- split - 按顺序或基于列值将文件拆分为单独的文件 🏃 ⭐。
- select - 通过过滤器选择行和列 🏃。
- flatten - 将记录打印为扁平格式以逐个查看记录。
- slice - 打印文件的行切片。
- search - 使用正则表达式进行搜索 🏃 ⭐。
- sort - 内部数据排序,最多支持两列 ⭐。
- sample - 基于优先队列的数据采样。
- stats - 对列(s)进行统计,包括最小值、最大值、平均值、唯一值、空值 🏃 ⭐。
- excel2csv - 将EXCEL转换为CSV。
- to - 将命令输出数据保存到磁盘,可以是TXT、CSV、TSV、XLSX或XLS之一。
- table - 将数据格式化为对齐的表格。
提示1
- 🏃 表示该命令支持实时进度条。
- ⭐ 表示该命令支持并行数据处理。
提示2
除了“clean”和“excel2csv”之外,所有命令都可以进行链式操作。
提示3
您可以通过执行 rsv 命令 --help 或 rsv 命令 -h 来检查每个命令的使用方法,例如,rsv frequency --help。
基本用法
- rsv head
rsv head data.csv # print as the file is
rsv head --tabled data.csv # tabled
rsv head -t data.csv # tabled too
rsv head -s \t data.csv # CSV file with a tab separator
rsv head data.xlsx # EXCEL file
rsv head --help # help info on all flags
- rsv header
rsv headers data.csv # separator "," (default)
rsv headers -s \t data.csv # separator tab
rsv headers data.xlsx # EXCEL file
rsv headers --help # help info on all flags
- rsv count
rsv count data.csv # plain-text file
rsv count data.xlsx # EXCEL file
rsv count --no-header data.csv
rsv count --help # help info on all flags
- rsv estimate
rsv estimate data.csv
rsv estimate data.xlsx
rsv estimate --help # help info on all flags
- rsv clean
rsv clean data.csv # default to clean escape char "
rsv clean -e \"content-to-delete\" data.csv # escape is a str, clean str to empty
rsv clean -o new-file.csv data.csv # save to new-file.csv, the default is data-cleaned.csv
rsv clean --help # help info on all flags
- rsv frequency
rsv frequency -c 0 data.csv # default to the first column, descending order
rsv frequency -c 0 data.xlsx # EXCEL file
rsv frequency -c 0,1,2,5 data.csv # columns 0, 1, 2, and 5
rsv frequency -c 0-2,5 data.csv # same as above
rsv frequency -c 0-2 --export data.csv # export result to data-frequency.csv
rsv frequency -n 10 data.csv # keep top 10 frequent items
rsv frequency -a 10 data.csv # in ascending order
rsv frequency --help # help info on all flags
column selection syntax:
-c 0,1,2,5 --> cols [0,1,2,5]
-c 0-2,5 --> same as cols [0,1,2,5]
- rsv split
rsv split data.csv # default to first column and field separator of ,
rsv split data.xlsx # EXCEL file
rsv split -s \t data.csv # tab separator
rsv split -c 1 data.csv # split based on second column
rsv split -c 0 -s \t data.csv # first column, \t separator
rsv split --size 1000 data.xlsx # Sequential split, 1000 records in a file.
rsv split --help # help info on all flags
- rsv select
rsv select -f 0=a,b,c data.csv # first column has values of a, b, or c
rsv select -f 0=a,b,c data.xlsx # EXCEL file, sheet can be specified with the --sheet flag
rsv select -f "0N>10&1=c" data.csv # first column > 10 numerically, AND the second column equals c
rsv select -f 0!= --export data.csv # export result, in which the first column is non-empty
rsv select --help # help info on other options
Filter syntax, support =, !=, >, >=, <, <= and &:
-f 0=a,b,c --> first column is a, b, or c
-f 0N=1,2 --> first column numerically equals to 1 or 2
-f 0!= --> first column is not empty
-f "0>=2022-01-21" --> first column equal to or bigger than 2022-01-21, lexicographically
-f "0N>10" --> first column > 10 numerically
-f "0N>10&2=pattern" --> first column > 10 numerically, AND the third column equals to <pattern>
NOTE: 1. only & (AND) operation is supported, | (OR) operation is not supported.
2. The filter option can be omitted to select all rows.
column selection syntax:
-c 0,1,2,5 --> cols [0,1,2,5]
-c 0-2,5 --> same as cols [0,1,2,5]
- rsv flatten
rsv flatten data.csv # default to show first 5 records
rsv flatten -n 50 data.csv # show 50 records
rsv flatten data.xls # EXCEL file
rsv flatten --delimiter \"--\" data.csv # change line delimiter to anything
rsv flatten --help # help info on all flags
- rsv slice
rsv slice -s 100 -e 150 data.csv # set start and end index
rsv slice -s 100 -l 50 data.csv # set start index and the length
rsv slice -s 100 -l 50 data.xlsx # EXCEL FILE
rsv slice -s 100 -l 50 --export data.csv # export to data-slice.csv
rsv slice -e 10 --export data.csv # set end index and export data
rsv slice -i 9 data.csv # the 10th line sliced only
rsv slice --help # help info on all flags
- rsv search
rsv search PATTERN data.csv # search PATTERN
rsv search "^\d{4}-\d{2}-\d{2}$" data.csv # search dates
rsv search --export PATTERN data.csv # export result
rsv search PATTERN data.xlsx # search EXCEL file
rsv slice --help # help info on all flags
- rsv sample
rsv sample data.csv # default to sample 10 records
rsv sample --no-header data.csv # no-header
rsv sample -n 20 data.csv # pull more
rsv sample -n 20 data.xlsx # EXCEL file
rsv sample --seed 100 data.xlsx # set a seed
rsv sample --time-limit 2 data.xlsx # set time limit to 2 seconds for large file
rsv sample -n 20 --export data.xlsx # data export
- rsv sort
rsv sort -c 0 data.csv # default to sort by first column in ascending
rsv sort -c 0D data.csv # descending sort
rsv sort -c 0DN data.csv # sort as numeric values
rsv sort -c 0DN,2N data.csv # sort two columns
rsv sort -E data.csv # export result
rsv sort data.xlsx # sort EXCEL file
- rsv stats
rsv stats data.csv # all columns, statistics include: min, max, mean, unique, null
rsv stats data.xlsx # EXCEL FILE
rsv stats -c 0,1 data.csv # first two columns
rsv stats -c 0,1 --export data.csv # export to data-stats.csv
rsv slice --help # help info on all flags
- rsv excel2csv
rsv excel2csv data.xlsx # apply to xlsx file, default to first sheet (or sheet1)
rsv excel2csv data.xls # apply also to xls file
rsv excel2csv --sheet 1 data.xls # second sheet, e.g., sheet 2
rsv excel2csv -S 1 data.xls # same as above
- rsv table
rsv head data.csv | rsv table # convert result to an aligned table
rsv slice -s 10 -e 15 data.csv | rsv table # convert result to an aligned table
命令管道
- 两个命令串联
rsv search "^\d{4}-\d{2}-\d{2}$" data.csv | rsv table # search date and print in an aligned table
rsv select -f 0=a,b data.csv | rsv frequency -c 0 # filter rows and get its frequency table
rsv select -f "0!=&2N>10" data.csv | rsv head -n 5 # filter rows, and show head 5 records
rsv select -f "2N=10,20" -c 0-4 data.csv | rsv stats # filter rows, select columns and make statistics
rsv select -f "2N=10,20" -c 0-4 data.csv | rsv sort -c 2 # filter rows, select columns and sort data
- 更多命令串联
rsv search pattern1 data.csv | rsv sort -c 1ND | rsv table # search, sort and print
rsv select -f 1=a,b data.csv | rsv search pattern | rsv stats # select, search, and make statistics
rsv select -f "0N>=10&0N<20" data.csv | rsv search pattern | rsv table # select, search, and print in a table
数据导出
- 方法 1:通过 --export 或 -E 标志,仅支持导出到 csv 文件
rsv slice -s 1000 -e 2000 --export data.csv # the data export flag
rsv slice -s 1000 -e 2000 -E data.csv # same as above
rsv search --export pattern data.xlsx # export search data
rsv select -f "0N>=10" --export pattern data.xlsx # export select data
- 方法 2:通过 "rsv to" 子命令,支持 csv、txt、tsv、excel
rsv slice -s 1000 -e 2000 data.csv | rsv to out.csv # export to CSV
rsv slice -s 1000 -e 2000 data.csv | rsv to out.xlsx # export to EXCEL
rsv search pattern data.xlsx | rsv to out.tsv # export to TSV
rsv select -f "0N>=10" pattern data.xlsx | rsv to out.txt # export to TXT
错误报告和建议
下一页
未来将添加新功能。
依赖关系
~17-27MB
~448K SLoC