#csv #excel #data #command-line #data-processing #txt #command-line-tool

app csv-txt-excel-parallel-toolkit

一个并行且快速的命令行工具,用于处理小型和大型的(>10G)CSV、TXT和EXCEL文件,具有统一的API

3个版本

0.4.7 2023年1月30日
0.4.6 2023年1月28日
0.4.5 2023年1月27日

#2624命令行实用工具

每月22 次下载

MIT OR Unlicense

2.5MB
5.5K SLoC

包含(ZIP文件,1.5MB)tests/data/hotel_reservation.xlsx,(ZIP文件,9KB)tests/data/empty.xlsx

用Rust编写的csv、excel工具包

rsv 是一个命令行工具,用于处理小型和大型的CSV、TXT、EXCEL文件(尤其是>10G)。rsv具有以下特性

  • 用Rust编写
  • 快速并行数据处理(基于Rayon)
  • 实时进度条
  • 使用简单
  • 支持命令管道

用法

从发布选项卡下载 rsv.exe,并将文件目录添加到系统路径中。

可用命令

  • head - 显示CSV、TXT或EXCEL文件的头部n行。
  • header - 显示文件头部。
  • count - 计算文件行数 🏃。
  • estimate - 快速估计行数。
  • clean - 使用转义字符(例如”)或其他字符串清理文件 🏃。
  • frequency - 显示列(s)的频率表 🏃 ⭐。
  • split - 按顺序或基于列值将文件拆分为单独的文件 🏃 ⭐。
  • select - 通过过滤器选择行和列 🏃。
  • flatten - 将记录打印为扁平格式以逐个查看记录。
  • slice - 打印文件的行切片。
  • search - 使用正则表达式进行搜索 🏃 ⭐。
  • sort - 内部数据排序,最多支持两列 ⭐。
  • sample - 基于优先队列的数据采样。
  • stats - 对列(s)进行统计,包括最小值、最大值、平均值、唯一值、空值 🏃 ⭐。
  • excel2csv - 将EXCEL转换为CSV。
  • to - 将命令输出数据保存到磁盘,可以是TXT、CSV、TSV、XLSX或XLS之一。
  • table - 将数据格式化为对齐的表格。

提示1

  • 🏃 表示该命令支持实时进度条。
  • ⭐ 表示该命令支持并行数据处理。

提示2

除了“clean”和“excel2csv”之外,所有命令都可以进行链式操作。

提示3

您可以通过执行 rsv 命令 --helprsv 命令 -h 来检查每个命令的使用方法,例如,rsv frequency --help。

基本用法

  • rsv head
rsv head data.csv                   # print as the file is
rsv head --tabled data.csv          # tabled
rsv head -t data.csv                # tabled too
rsv head -s \t data.csv             # CSV file with a tab separator
rsv head data.xlsx                  # EXCEL file
rsv head --help                     # help info on all flags
  • rsv header
rsv headers data.csv                # separator "," (default)
rsv headers -s \t data.csv          # separator tab
rsv headers data.xlsx               # EXCEL file
rsv headers --help                  # help info on all flags
  • rsv count
rsv count data.csv                  # plain-text file
rsv count data.xlsx                 # EXCEL file
rsv count --no-header data.csv
rsv count --help                    # help info on all flags
  • rsv estimate
rsv estimate data.csv
rsv estimate data.xlsx
rsv estimate --help                 # help info on all flags
  • rsv clean
rsv clean data.csv                               # default to clean escape char "
rsv clean -e \"content-to-delete\" data.csv      # escape is a str, clean str to empty
rsv clean -o new-file.csv data.csv               # save to new-file.csv, the default is data-cleaned.csv
rsv clean --help                                 # help info on all flags
  • rsv frequency
rsv frequency -c 0 data.csv              # default to the first column, descending order
rsv frequency -c 0 data.xlsx             # EXCEL file
rsv frequency -c 0,1,2,5 data.csv        # columns 0, 1, 2, and 5
rsv frequency -c 0-2,5 data.csv          # same as above
rsv frequency -c 0-2 --export data.csv   # export result to data-frequency.csv
rsv frequency -n 10 data.csv             # keep top 10 frequent items
rsv frequency -a 10 data.csv             # in ascending order
rsv frequency --help                     # help info on all flags

column selection syntax:
-c 0,1,2,5   -->    cols [0,1,2,5]
-c 0-2,5     -->    same as cols [0,1,2,5]
  • rsv split
rsv split data.csv                # default to first column and field separator of ,
rsv split data.xlsx               # EXCEL file
rsv split -s \t data.csv          # tab separator
rsv split -c 1 data.csv           # split based on second column
rsv split -c 0 -s \t data.csv     # first column, \t separator
rsv split --size 1000 data.xlsx   # Sequential split, 1000 records in a file.
rsv split --help                  # help info on all flags
  • rsv select
rsv select -f 0=a,b,c data.csv          # first column has values of a, b, or c
rsv select -f 0=a,b,c data.xlsx         # EXCEL file, sheet can be specified with the --sheet flag
rsv select -f "0N>10&1=c" data.csv      # first column > 10 numerically, AND the second column equals c
rsv select -f 0!= --export data.csv     # export result, in which the first column is non-empty
rsv select --help                       # help info on other options

Filter syntax, support =, !=, >, >=, <, <= and &:
-f 0=a,b,c           -->  first column is a, b, or c
-f 0N=1,2            -->  first column numerically equals to 1 or 2
-f 0!=               -->  first column is not empty
-f "0>=2022-01-21"   -->  first column equal to or bigger than 2022-01-21, lexicographically
-f "0N>10"           -->  first column > 10 numerically
-f "0N>10&2=pattern" -->  first column > 10 numerically, AND the third column equals to <pattern>

NOTE: 1. only & (AND) operation is supported, | (OR) operation is not supported.
      2. The filter option can be omitted to select all rows.

column selection syntax:
-c 0,1,2,5   -->    cols [0,1,2,5]
-c 0-2,5     -->    same as cols [0,1,2,5]
  • rsv flatten
rsv flatten data.csv                       # default to show first 5 records
rsv flatten -n 50 data.csv                 # show 50 records
rsv flatten data.xls                       # EXCEL file
rsv flatten --delimiter \"--\" data.csv    # change line delimiter to anything
rsv flatten --help                         # help info on all flags
  • rsv slice
rsv slice -s 100 -e 150 data.csv           # set start and end index
rsv slice -s 100 -l 50 data.csv            # set start index and the length
rsv slice -s 100 -l 50 data.xlsx           # EXCEL FILE
rsv slice -s 100 -l 50 --export data.csv   # export to data-slice.csv
rsv slice -e 10 --export data.csv          # set end index and export data
rsv slice -i 9 data.csv                    # the 10th line sliced only
rsv slice --help                           # help info on all flags
  • rsv search
rsv search PATTERN data.csv                # search PATTERN
rsv search "^\d{4}-\d{2}-\d{2}$" data.csv  # search dates
rsv search --export PATTERN data.csv       # export result
rsv search PATTERN data.xlsx               # search EXCEL file
rsv slice --help                           # help info on all flags
  • rsv sample
rsv sample data.csv                 # default to sample 10 records
rsv sample --no-header data.csv     # no-header
rsv sample -n 20 data.csv           # pull more
rsv sample -n 20 data.xlsx          # EXCEL file
rsv sample --seed 100 data.xlsx     # set a seed
rsv sample --time-limit 2 data.xlsx # set time limit to 2 seconds for large file
rsv sample -n 20 --export data.xlsx # data export
  • rsv sort
rsv sort -c 0 data.csv        # default to sort by first column in ascending
rsv sort -c 0D data.csv       # descending sort
rsv sort -c 0DN data.csv      # sort as numeric values
rsv sort -c 0DN,2N data.csv   # sort two columns
rsv sort -E data.csv          # export result
rsv sort data.xlsx            # sort EXCEL file
  • rsv stats
rsv stats data.csv                       # all columns, statistics include: min, max, mean, unique, null
rsv stats data.xlsx                      # EXCEL FILE
rsv stats -c 0,1 data.csv                # first two columns
rsv stats -c 0,1 --export data.csv       # export to data-stats.csv
rsv slice --help                         # help info on all flags
  • rsv excel2csv
rsv excel2csv data.xlsx                 # apply to xlsx file, default to first sheet (or sheet1)
rsv excel2csv data.xls                  # apply also to xls file
rsv excel2csv --sheet 1 data.xls        # second sheet, e.g., sheet 2
rsv excel2csv -S 1 data.xls             # same as above
  • rsv table
rsv head data.csv | rsv table                   # convert result to an aligned table
rsv slice -s 10 -e 15 data.csv | rsv table      # convert result to an aligned table

命令管道

  • 两个命令串联
rsv search "^\d{4}-\d{2}-\d{2}$" data.csv | rsv table     # search date and print in an aligned table
rsv select -f 0=a,b data.csv | rsv frequency -c 0         # filter rows and get its frequency table
rsv select -f "0!=&2N>10" data.csv | rsv head -n 5        # filter rows, and show head 5 records
rsv select -f "2N=10,20" -c 0-4 data.csv | rsv stats      # filter rows, select columns and make statistics
rsv select -f "2N=10,20" -c 0-4 data.csv | rsv sort -c 2  # filter rows, select columns and sort data
  • 更多命令串联
rsv search pattern1 data.csv | rsv sort -c 1ND | rsv table             # search, sort and print
rsv select -f 1=a,b data.csv | rsv search pattern | rsv stats          # select, search, and make statistics
rsv select -f "0N>=10&0N<20" data.csv | rsv search pattern | rsv table # select, search, and print in a table

数据导出

  • 方法 1:通过 --export 或 -E 标志,仅支持导出到 csv 文件
rsv slice -s 1000 -e 2000 --export data.csv           # the data export flag
rsv slice -s 1000 -e 2000 -E data.csv                 # same as above
rsv search --export pattern data.xlsx                 # export search data
rsv select -f "0N>=10" --export pattern data.xlsx     # export select data
  • 方法 2:通过 "rsv to" 子命令,支持 csv、txt、tsv、excel
rsv slice -s 1000 -e 2000 data.csv | rsv to out.csv          # export to CSV
rsv slice -s 1000 -e 2000 data.csv | rsv to out.xlsx         # export to EXCEL
rsv search pattern data.xlsx | rsv to out.tsv                # export to TSV
rsv select -f "0N>=10" pattern data.xlsx | rsv to out.txt    # export to TXT

错误报告和建议

219352261 QQ 聊天室

下一页

未来将添加新功能。

依赖关系

~17-27MB
~448K SLoC