#filter #regex #sorting #most #recently #set #bash-history

bin+lib cleanup-history

通过正则表达式过滤我的 .bash_history,去重并按最近使用排序

3 个不稳定版本

0.2.0 2022年9月1日
0.1.1 2022年1月29日
0.1.0 2020年7月25日

#5#recently

MIT 许可证

17KB
380

cleanup-history-rs

通过正则表达式过滤我的 .bash_history,去重并按最近使用排序。

基于 https://github.com/naggie/dotfiles/blob/master/scripts/cleanup-history

.bash_history 说明

格式

#1593575811
echo each command has a timestamp immediately before it
#2
#1593575811
echo 'after multiple timestamp lines, `history` will show the timestamp 1593575811'
#1593575811
#3
echo after multiple timestamp lines, this will show the timestamp 3
#1
echo 'when you run `history` this will show up with a timestamp long ago but still at the end of the list'
#1593575811
echo this will have the same timestamp as others above, duplicates don\'t matter
#1593575812
#1593575813
#1593575814
#1593575815
#1593575816
#1593575817
#1593575818
#1593575819
#1593575820
#1593575821
echo 'once you `history -w` all these extra timestamps will get removed'
#1593576854
for ((i=0;i<5;i++)); do echo $i; done
#1593576854
echo ^^ that was written on multiple lines
#1593576874
echo 'foo
bar'
#1593576874
echo ^^ that was also written on multiple lines, cmdhist=on, lithist=off

注意事项

如果一行以 #\d+ 开头,它将被解释为时间戳。

$ export HISTFILE=./foo
$ history -c
$ echo 'this
#1234
that'
$ history -w
$ cat foo
#1594044806
echo 'this
#1234
that'
#1594044814
history -w
$ history -c
$ history -r
$ history
    1  2020-07-06 08:16.14 | history -r
    2  2020-07-06 08:15.15 | echo 'this
    3  1969-12-31 17:20.34 | that'
    4  2020-07-06 08:15.25 | history -w
    5  2020-07-06 08:16.16 | history
$ history -c
$ echo 'foo
#1234 bar
baz'
$ history -w
$ history # correct in memory
    1  2020-07-06 08:24.30 | echo 'foo
#1234 bar
baz'
    2  2020-07-06 08:24.38 | history -w
    3  2020-07-06 08:24.41 | history
$ cat foo
#1594045470
echo 'foo
#1234 bar
baz'
#1594045478
history -w
$ history -c # clear in-memory history
$ history -r # reread from file
$ history # now incorrectly interprets `#1234 bar` as a timestamp
    1  2020-07-06 08:19.49 | history -r
    2  2020-07-06 08:19.09 | echo 'foo
    3  1969-12-31 17:20.34 | baz'
    4  2020-07-06 08:19.31 | history -w
    5  2020-07-06 08:19.51 | history

基准测试

去重行数略有不同,因为正则表达式略有不同 ¯\(ツ)/¯。我认为这足够接近,可以提供信息。

$ wc -l bash_history.bak
86636 bash_history.bak
$ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_python' \
    --export-markdown=bash-history-python.txt \
    --time-unit=millisecond \
    'python3 cleanup-history.py bash_history_python'
$ wc -l bash_history_python
73149 bash_history_python
$ hyperfine --warmup=5 --prepare='cp bash_history.bak bash_history_rust' \
    --export-markdown=bash-history-rust.txt \
    --time-unit=millisecond \
    'cleanup-history-rs/target/release/cleanup-history bash_history_rust'
$ wc -l bash_history_rust
64638 bash_history_rust
命令 平均 [ms] 最小 [ms] 最大 [ms]
python3 cleanup-history.py bash_history_python 2069.9 ± 112.4 1935.1 2356.4
cleanup-history-rs/target/release/cleanup-history bash_history_rust 653.5 ± 22.1 631.2 698.9

依赖项

~4–13MB
~159K SLoC