2个不稳定版本
0.2.0 | 2023年12月18日 |
---|---|
0.1.0 | 2023年12月18日 |
384 在 文本处理 中
8KB
59 行
grader
此CLI工具旨在通过根据用户定义的标准将行分类到两个bins中,有效地对大型文本文件进行二进制排序。它通过将行流式传输到子进程(如grep
)并根据子进程的echo响应对这些行进行排序来实现。回显的行被放置在'bin1'中,理想情况下配置为最预期的案例,而未回显的行被分类到'bin2'。
此排序机制依赖于在假设任何省略的行属于'bin2'之前等待回显的行,这使得将'bin1'配置为更常见的情况以避免缓冲非常重要。该工具特别适用于处理日志文件或任何大型数据集等任务,其中二进制分类对于组织和分析有帮助。
安装
cargo install grader
用法
Binary sorter for text files. Lines are sorted into two bins based on child process response
Usage: grader <BIN1> <BIN2> <COMMAND> [ARGS]...
Arguments:
<BIN1> Path for output bin 1 (for echoed lines)
<BIN2> Path for output bin 2 (for non-echoed lines)
<COMMAND> Command to execute for processing lines
[ARGS]... Arguments for the command
示例
$ cat http.log
192.168.1.1 - - [16/Dec/2023:10:31:45 -0500] "GET /index.html HTTP/1.1" 200 4523
192.168.1.2 - - [16/Dec/2023:10:32:10 -0500] "GET /about.html HTTP/1.1" 200 3498
192.168.1.3 - - [16/Dec/2023:10:33:30 -0500] "POST /login HTTP/1.1" 500 1287 **(Error)**
192.168.1.4 - - [16/Dec/2023:10:34:22 -0500] "GET /contact.html HTTP/1.1" 200 2310
192.168.1.5 - - [16/Dec/2023:10:35:14 -0500] "GET /products.html HTTP/1.1" 200 4981
192.168.1.6 - - [16/Dec/2023:10:36:03 -0500] "GET / HTTP/1.1" 404 1748 **(Error)**
192.168.1.7 - - [16/Dec/2023:10:37:45 -0500] "GET /blog.html HTTP/1.1" 200 3250
192.168.1.8 - - [16/Dec/2023:10:38:52 -0500] "GET /news.html HTTP/1.1" 200 2891
192.168.1.9 - - [16/Dec/2023:10:39:17 -0500] "POST /api/data HTTP/1.1" 500 902 **(Error)**
192.168.1.10 - - [16/Dec/2023:10:40:05 -0500] "GET /terms.html HTTP/1.1" 200 4076
cat http.log | grader ok.log err.log -- grep -v -E "HTTP/1.1\" (500|404)"
$ cat ok.log
192.168.1.1 - - [16/Dec/2023:10:31:45 -0500] "GET /index.html HTTP/1.1" 200 4523
192.168.1.2 - - [16/Dec/2023:10:32:10 -0500] "GET /about.html HTTP/1.1" 200 3498
192.168.1.4 - - [16/Dec/2023:10:34:22 -0500] "GET /contact.html HTTP/1.1" 200 2310
192.168.1.5 - - [16/Dec/2023:10:35:14 -0500] "GET /products.html HTTP/1.1" 200 4981
192.168.1.7 - - [16/Dec/2023:10:37:45 -0500] "GET /blog.html HTTP/1.1" 200 3250
192.168.1.8 - - [16/Dec/2023:10:38:52 -0500] "GET /news.html HTTP/1.1" 200 2891
192.168.1.10 - - [16/Dec/2023:10:40:05 -0500] "GET /terms.html HTTP/1.1" 200 4076
$ cat err.log
192.168.1.3 - - [16/Dec/2023:10:33:30 -0500] "POST /login HTTP/1.1" 500 1287 **(Error)**
192.168.1.6 - - [16/Dec/2023:10:36:03 -0500] "GET / HTTP/1.1" 404 1748 **(Error)**
192.168.1.9 - - [16/Dec/2023:10:39:17 -0500] "POST /api/data HTTP/1.1" 500 902 **(Error)**
依赖关系
~1.2–1.8MB
~34K SLoC