2 个不稳定版本
0.2.0 | 2023年5月11日 |
---|---|
0.1.1 | 2022年12月31日 |
#672 in 文本处理
30KB
362 行
hns
— Human Numeric Sort v0.2.0 (⏫︎2023-05-11)
- © 2022–2023 Fredrick R. Brennan 和
hns
作者- Apache 2.0 许可证,见LICENSE。
man
页面
包
hns_0.2.0_amd64.deb
(适用于Debian, Ubuntu, Pop! OS等)hns-0.2.0-1.x86_64.rpm
(适用于Fedora, CentOS n, Red Hat Linux等)humnumsort.PKGBUILD
(适用于Arch Linux)
或者如果你有Rust工具链,可以通过Rust的包管理器cargo
安装
cargoinstall hns
商业
这曾经发生在你身上吗?
$ seq 1 30 | awk '{printf "data_%s.txt\n", $1}' | sort -h > important_filenames.txt
$ sort -h < important_filenames.txt
data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_1.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_2.txt
data_30.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
哦,不!你忘了GNU coreutils sort
包的-h
标志实际上并没有做它声称的事情,而且由于各种历史原因无法修复(现在不要睡着,跟我来)!
-h, --human-numeric-sort
compare human readable numbers (e.g., 2K 1G)
如果只有更好的办法!
嗨,FREDDY MAYS在这里有另一个FUR-tastic发明。所有你的数字,只为你们排序!
$ mv important_filenames.txt tests/data/README_example.txt
$ hns < tests/data/README_example.shuf.txt
data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt
data_11.txt
data_12.txt
data_13.txt
data_14.txt
data_15.txt
data_16.txt
data_17.txt
data_18.txt
data_19.txt
data_20.txt
data_21.txt
data_22.txt
data_23.txt
data_24.txt
data_25.txt
data_26.txt
data_27.txt
data_28.txt
data_29.txt
data_30.txt
哇!
但如果你在下一个Unix纪元运行git pull
,你还会得到我的超级超级负数理解版本!
$ seq -10 10 | awk '{printf "data_%s.txt\n", $1}' | sort -h > tests/data/README_example2.shuf.txt
在你之前的数字是悲伤和单调的...
data_0.txt
data_-10.txt
data_10.txt
data_-1.txt
data_1.txt
data_-2.txt
data_2.txt
data_-3.txt
data_3.txt
data_-4.txt
data_4.txt
data_-5.txt
data_5.txt
data_-6.txt
data_6.txt
data_-7.txt
data_7.txt
data_-8.txt
data_8.txt
data_-9.txt
data_9.txt
但现在它们可以彻底有序! (哇!)
$ hns < tests/data/README_example2.shuf.txt
data_-10.txt
data_-9.txt
data_-8.txt
data_-7.txt
data_-6.txt
data_-5.txt
data_-4.txt
data_-3.txt
data_-2.txt
data_-1.txt
data_0.txt
data_1.txt
data_2.txt
data_3.txt
data_4.txt
data_5.txt
data_6.txt
data_7.txt
data_8.txt
data_9.txt
data_10.txt
使用起来非常简单!没有命令行选项!只需标准输入和标准输出,一个尺寸适合所有!(真的,Freddy?)(是的!)
它不仅适用于短文件,哦不不不!它是用Rust编写的,所以你知道它可以处理甚至最大的数据溢出,例如整个随机排序的A类网络!
$ time RUST_LOG=INFO target/release/hns < /tmp/0.0.0.0/8.shuf.txt > /dev/null
[2022-09-20T14:58:00Z INFO hns] Reading done; got 16777216 lines in 348513µs
[2022-09-20T14:58:28Z INFO hns] Sorting done; sorted 16777216 lines in 27122184µs
[2022-09-20T14:58:32Z INFO hns] Writing done; wrote in 4370570µs
real 0m31.855s
user 0m30.306s
sys 0m1.548s
每行1600万个行,每个行有四个比较点,在不到三十秒内排序!这是Freddy Mays的保证。
基准数据
-
您可能希望将其克隆为
$ git clone https://github.com/ctrlcctrlv/humnumsort-test-data/ tests/data/expensive
待办
- 通过另一个二进制数进行零填充?
许可协议
本软件受Apache License 2.0(以下简称“许可协议”)许可;除非遵守许可协议,否则不得使用此文件。您可以在以下地址获取许可协议副本:
http://www.apache.org/licenses/LICENSE-2.0
除非适用法律要求或经书面同意,否则根据许可协议分发的软件按“原样”分发,不提供任何形式的明示或暗示担保。有关许可协议中规定的权限和限制的具体语言,请参阅许可协议。
依赖项
~8–21MB
~270K SLoC