#csv #column #data #command-line-tool #show #pattern #quick

bin+lib analyst

一个支持快速浏览 CSV 数据的命令行工具

2 个不稳定版本

新版本 0.1.0 2024 年 8 月 15 日
0.0.0 2021 年 12 月 1 日

#1288命令行工具

Download history

每月 76 次下载

MIT/Apache

21KB
385

分析师

分析师是一个支持快速浏览 CSV 数据的命令行工具,它可以以流模式动态读取 CSV 并进行分析。它可以方便地查看 CSV 文件的缺失值、找到 CSV 数据的频繁模式、统计每列数据的频率、找到列的最大和最小值等。

命令

  • show: 显示行,默认 10 行,最大 100 行
    • analyst show file.csv--start{start} --end{end}
  • missing-values: 显示缺失值
    • analyst missing-values file.csv
  • frequent-patterns: 显示频繁模式
    • analyst frequent-patterns file.csv--min-support{ratio}
  • column-stats: 显示列统计
    • analyst column-stats file.csv--column{column}
  • extrema: 显示列极值
    • analyst extrema file.csv--column{column}

示例

以下是一个示例 CSV 文件。

ID,Name,Age,Grade,Subject,Score,Attendance
1,Alice Smith,18,12,Math,95,98%
2,Bob Johnson,17,11,Physics,88,95%
3,Charlie Brown,16,10,Chemistry,78,92%
4,Diana Lee,,12,Biology,92,97%
5,Eva Martinez,18,12,Math,91,99%
6,Frank Wilson,17,11,,85,93%
7,Grace Taylor,16,10,Physics,89,96%
8,Henry Davis,18,12,Chemistry,,90%
9,Ivy Chen,17,11,Biology,94,98%
10,Jack Thompson,16,10,Math,82,
  1. analyst show test_data.csv
+----+---------------+-----+-------+-----------+-------+------------+
| ID | Name          | Age | Grade | Subject   | Score | Attendance |
+----+---------------+-----+-------+-----------+-------+------------+
| 1  | Alice Smith   | 18  | 12    | Math      | 95    | 98%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 2  | Bob Johnson   | 17  | 11    | Physics   | 88    | 95%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 3  | Charlie Brown | 16  | 10    | Chemistry | 78    | 92%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 4  | Diana Lee     |     | 12    | Biology   | 92    | 97%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 5  | Eva Martinez  | 18  | 12    | Math      | 91    | 99%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 6  | Frank Wilson  | 17  | 11    |           | 85    | 93%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 7  | Grace Taylor  | 16  | 10    | Physics   | 89    | 96%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 8  | Henry Davis   | 18  | 12    | Chemistry |       | 90%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 9  | Ivy Chen      | 17  | 11    | Biology   | 94    | 98%        |
+----+---------------+-----+-------+-----------+-------+------------+
| 10 | Jack Thompson | 16  | 10    | Math      | 82    |            |
+----+---------------+-----+-------+-----------+-------+------------+
  1. analyst missing-values test_data.csv
Total rows analyzed: 10
Missing value analysis:
Age: 1 missing values (10.00%)
Name: 0 missing values (0.00%)
Subject: 1 missing values (10.00%)
Score: 1 missing values (10.00%)
Attendance: 1 missing values (10.00%)
ID: 0 missing values (0.00%)
Grade: 0 missing values (0.00%)
  1. analyst column-stats test_data.csv--column Age
Total rows analyzed: 10
Column statistics:

Column: Age
  18: 3 occurrences (30.00%)
  17: 3 occurrences (30.00%)
  16: 3 occurrences (30.00%)
  : 1 occurrences (10.00%)
  1. analyst extrema test_data.csv--column Score
Extrema for column 'Score':
  Minimum value: 78
  Maximum value: 95
  1. analyst frequent-patterns test_data.csv--min-support0.3
Frequent patterns (min support: 30.00%):

1-item frequent patterns:
  Age:16,Grade:10 (support: 30.00%)
  Age:17,Grade:11 (support: 30.00%)
  Age:18,Grade:12 (support: 30.00%)

依赖关系

~4–13MB
~124K SLoC