3 个稳定版本

1.1.0	2019年3月17日
1.0.1	2019年3月15日

#478 in 可视化

CC0 许可证

44KB
759 代码行

tbuck - 时间序列分桶

tbuck 是一个简单的 CLI 工具，允许您将文本行分组到按照某些时间粒度的时间桶中，并输出每个桶的出现的次数。我写它的动机是我发现自己正在调试一个工作问题，试图找出特定事件发生的频率，该事件由应用程序日志文件中的一行标识。该事件不对应于我们监控系统发出的任何指标，但我想要看到事件发生的频率图。这个需求在调查期间多次出现，我为每种文件格式编写了相应的脚本。最后我意识到，所有的脚本基本上都在做同样的事情，于是写了 tbuck。

用法

tbuck 1.1.0
Drake Tetreault <ekardnt@ekardnt.com>
A command line tool for bucketing time-series text data

USAGE:
    tbuck [FLAGS] [OPTIONS] <DATE_TIME_FORMAT> [INPUT_FILE]...

FLAGS:
    -d, --descending
            By default stream mode expects entries to be in monotonically ascending order by date (earlier dates
            followed by later dates), which is the usual order of log files. If this flag is present then stream mode
            will instead expect entries in monotonically decreasing order by date (later dates followed by earlier
            dates). In normal mode, this flag will cause the buckets to be printed in descending order instead of the
            default ascending order.
    -h, --help
            Prints help information

    -n, --no-fill
            By default buckets which had no entries present will be displayed with a count of 0. If this flag is present
            then instead the bucket will not be printed at all.
    -s, --stream
            Enable stream mode. Entries will be expected to arrive in monotonically increasing (or --decreasing) order,
            and bucket information will be printed live as soon as the bucket is known to be finished. By default the
            presence of any entry violating the monotonic order will cause an error, but this can be made --tolerant.
    -t, --tolerant
            By default when a non-monotonic entry is encountered in stream mode the program will terminate with an
            error. If this flag is present then non-monotonic entries will instead be silently discarded.
    -V, --version
            Prints version information


OPTIONS:
    -g, --granularity <GRANULARITY>
            Bucket time granularity in seconds ('5s'), minutes ('1m'), or hours ('2h') [default: 1m]

    -m, --match-index <MATCH_INDEX>
            0-based index of match to use if multiple matches are found [default: 0]


ARGS:
    <DATE_TIME_FORMAT>
            Date/time parsing format. Full date and time information must be present. The following specifiers are
            supported, taken from Rust's chrono crate:
            Specifier   Example     Description
            %Y          2001        The full proleptic Gregorian year, zero-padded to 4 digits.
            %m          07          Month number (01--12), zero-padded to 2 digits.
            %b          Jul         Abbreviated month name. Always 3 letters.
            %B          July        Full month name. Also accepts corresponding abbreviation in parsing.
            %d          08          Day number (01--31), zero-padded to 2 digits.
            %F          2001-07-08  Year-month-day format (ISO 8601). Same to %Y-%m-%d.
            %H          00          Hour number (00--23), zero-padded to 2 digits.
            %I          12          Hour number in 12-hour clocks (01--12), zero-padded to 2 digits.
            %M          34          Minute number (00--59), zero-padded to 2 digits.
            %S          60          Second number (00--60), zero-padded to 2 digits.
            %T          00:34:60    Hour-minute-second format. Same to %H:%M:%S.
            %P          am          am or pm in 12-hour clocks.
            %p          AM          AM or PM in 12-hour clocks.
            %s          994518299   UNIX timestamp, the number of seconds since 1970-01-01 00:00 UTC.
    <INPUT_FILE>...
            Input files; or standard input if none provided

示例

假设你正在处理以下日志文件。

$ cat demo.txt
2019-03-14 12:01:00 Event A
2019-03-14 12:01:10 Event B
2019-03-14 12:01:20 Event A
2019-03-14 12:01:30 Event B
2019-03-14 12:01:40 Event A
2019-03-14 12:01:50 Event B
2019-03-14 12:02:00 Event A
2019-03-14 12:02:10 Event B
2019-03-14 12:02:20 Event A
2019-03-14 12:02:30 Event B
2019-03-14 12:02:40 Event A
2019-03-14 12:02:50 Event B
2019-03-14 12:03:00 Event A
2019-03-14 12:03:10 Event B
2019-03-14 12:03:20 Event A
2019-03-14 12:03:30 Event B
2019-03-14 12:03:40 Event A
2019-03-14 12:03:50 Event B

你想查看文件中每1分钟桶有多少日志行。

$ tbuck --granularity 1m '%F %T' demo.txt
2019-03-14 12:01:00 UTC,6
2019-03-14 12:02:00 UTC,6
2019-03-14 12:03:00 UTC,6

你想查看文件中每30秒桶有多少日志行。注意，从现在开始，这些示例将使用 -g 的简短形式 --granularity 参数。

$ tbuck -g 30s '%F %T' demo.txt
2019-03-14 12:01:00 UTC,3
2019-03-14 12:01:30 UTC,3
2019-03-14 12:02:00 UTC,3
2019-03-14 12:02:30 UTC,3
2019-03-14 12:03:00 UTC,3
2019-03-14 12:03:30 UTC,3

你想查看文件中每15秒桶有多少事件A的日志行。 rg 是 ripgrep。

$rg "Event A" demo.txt | tbuck -g 15s '%F %T'
2019-03-14 12:01:00 UTC,1
2019-03-14 12:01:15 UTC,1
2019-03-14 12:01:30 UTC,1
2019-03-14 12:01:45 UTC,0
2019-03-14 12:02:00 UTC,1
2019-03-14 12:02:15 UTC,1
2019-03-14 12:02:32019-03-14 12:02:45 UTC,00 UTC,1
2019-03-14 12:02:45 UTC,0
2019-03-14 12:03:00 UTC,1
2019-03-14 12:03:15 UTC,1
2019-03-14 12:03:30 UTC,1

你注意到上一个命令为没有任何条目落在其中的桶打印了0，但你出于某种原因不希望这样。

$rg "Event A" demo.txt | tbuck -g 15s --no-fill '%F %T'
2019-03-14 12:01:00 UTC,1
2019-03-14 12:01:15 UTC,1
2019-03-14 12:01:30 UTC,1
2019-03-14 12:02:00 UTC,1
2019-03-14 12:02:15 UTC,1
2019-03-14 12:02:30 UTC,1
2019-03-14 12:03:00 UTC,1
2019-03-14 12:03:15 UTC,1
2019-03-14 12:03:30 UTC,1

依赖

~6MB
~90K SLoC