8个版本
0.3.2 | 2022年12月27日 |
---|---|
0.3.1 | 2022年12月15日 |
0.3.0 | 2022年4月11日 |
0.2.2 | 2021年6月8日 |
0.1.7 | 2018年8月1日 |
#315 在 内存管理 中
10,099 每月下载量
在 2 个Crates中使用 (通过 hyperscan)
150KB
1K SLoC
rust-hyperscan
Hyperscan 是一个高性能的正则表达式匹配库。
用法
要使用,请将以下行添加到 Cargo.toml 下的 [dependencies]
hyperscan = "0.3"
示例
use hyperscan::prelude::*;
fn main() {
let pattern = pattern! {"test"; CASELESS | SOM_LEFTMOST};
let db: BlockDatabase = pattern.build().unwrap();
let scratch = db.alloc_scratch().unwrap();
let mut matches = vec![];
db.scan("some test data", &scratch, |id, from, to, flags| {
println!("found pattern #{} @ [{}, {})", id, from, to);
matches.push(from..to);
Matching::Continue
}).unwrap();
assert_eq!(matches, vec![5..9]);
}
功能
Hyperscan v5 API
从 Hyperscan v5.0 开始,引入了几个新的API和标志。
rust-hyperscan
默认使用API的最新版本,提供了新的功能,如 Literal
。
如果您想使用 Hyperscan v4.x,可以在编译时禁用 v5
功能。
[dependencies.hyperscan]
version = "0.3"
default-features = false
features = ["full"]
Chimera API
为了提高正则表达式兼容性,Hyperscan v5.0 开始提供PCRE兼容的 Chimera 库。
要启用 Chimera
支持,您需要手动下载PCRE 8.41或更高版本,解压到 Hyperscan 5.x 的源目录,编译并安装它。
$ cd hyperscan-5.4.0
$ wget https://ftp.pcre.org/pub/pcre/pcre-8.45.tar.gz
$ mkdir pcre
$ tar xvf pcre-8.45.tar.gz --strip-components=1 --directory pcre
$ mkdir build && cd build
$ cmake .. -DCMAKE_INSTALL_PREFIX=`pwd` -DBUILD_STATIC_LIBS=on -G Ninja
$ ninja
$ ninja install
然后使用环境变量 PKG_CONFIG_PATH
指向 hyperscan 安装目录并启用 chimera
功能。
$ PKG_CONFIG_PATH=<CMAKE_INSTALL_PREFIX>/lib/pkgconfig cargo build
应启用 chimera
功能。
[dependencies]
hyperscan = { version = "0.3", features = ["chimera"] }
注意:Chimera
库不支持动态库链接模式,当启用 chimera
时,自动启用 static
功能。
静态链接模式
从版本 0.2 开始,rust-hyperscan
默认使用动态库链接模式。如果您需要链接静态库,可以使用 static
功能。
[dependencies]
hyperscan = { version = "0.3", features = ["static"] }
Hyperscan 运行时
HyperScan提供了一个独立的运行时库,可以单独使用。如果不需要在运行时编译正则表达式,可以使用runtime
模式来减小可执行文件的大小,并消除C++依赖。
[dependencies.hyperscan]
version = "0.3"
default-features = false
features = ["runtime"]
基准测试
为了提供性能比较,这里提供了Hyperscan
、Chimera
和regex
性能测试工具。
它们将在不同长度的数据上使用相同的测试集。
级别 | 模式 |
---|---|
Easy0 | ABCDEFGHIJKLMNOPQRSTUVWXYZ$ |
Easy0i | (?i)ABCDEFGHIJklmnopqrstuvwxyz$ |
Easy1 | A[AB]B[BC]C[CD]D[DE]E[EF]F[FG]G[GH]H[HI]I[IJ]J$ |
中等 | [XYZ]ABCDEFGHIJKLMNOPQRSTUVWXYZ$ |
困难 | [ -~]*ABCDEFGHIJKLMNOPQRSTUVWXYZ$ |
困难1 | ABCD|CDEF|EFGH|GHIJ|IJKL|KLMN|MNOP|OPQR|QRST|STUV|UVWX|WXYZ |
您可以使用cargo criterion命令在您的环境中运行基准测试。
$ cargo criterion --features chimera
Finished bench [optimized] target(s) in 0.24s
hyperscan/Hard/16 time: [7.8000 ns 7.8566 ns 7.9212 ns]
thrpt: [1.8812 GiB/s 1.8967 GiB/s 1.9104 GiB/s]
hyperscan/Hard/32 time: [53.495 ns 53.668 ns 53.850 ns]
thrpt: [566.71 MiB/s 568.64 MiB/s 570.48 MiB/s]
hyperscan/Hard/1024 time: [91.912 ns 95.150 ns 101.64 ns]
thrpt: [9.3826 GiB/s 10.023 GiB/s 10.376 GiB/s]
hyperscan/Hard/32768 time: [1.1840 us 1.1917 us 1.2015 us]
thrpt: [25.400 GiB/s 25.608 GiB/s 25.776 GiB/s]
hyperscan/Hard/1048576 time: [35.849 us 35.981 us 36.133 us]
thrpt: [27.027 GiB/s 27.141 GiB/s 27.241 GiB/s]
hyperscan/Hard/33554432 time: [2.1729 ms 2.1893 ms 2.2075 ms]
thrpt: [14.156 GiB/s 14.274 GiB/s 14.381 GiB/s]
hyperscan/Medium/16 time: [7.8242 ns 7.9619 ns 8.1801 ns]
thrpt: [1.8216 GiB/s 1.8716 GiB/s 1.9045 GiB/s]
hyperscan/Medium/32 time: [71.114 ns 71.583 ns 72.090 ns]
thrpt: [423.33 MiB/s 426.32 MiB/s 429.13 MiB/s]
hyperscan/Medium/1024 time: [110.53 ns 111.09 ns 111.75 ns]
thrpt: [8.5343 GiB/s 8.5845 GiB/s 8.6280 GiB/s]
hyperscan/Medium/32768 time: [1.2145 us 1.2289 us 1.2502 us]
thrpt: [24.410 GiB/s 24.834 GiB/s 25.128 GiB/s]
hyperscan/Medium/1048576
time: [36.109 us 37.559 us 40.516 us]
thrpt: [24.103 GiB/s 26.001 GiB/s 27.045 GiB/s]
hyperscan/Medium/33554432
time: [2.1737 ms 2.1863 ms 2.1998 ms]
thrpt: [14.206 GiB/s 14.293 GiB/s 14.376 GiB/s]
hyperscan/Easy0/16 time: [7.6203 ns 7.7540 ns 7.9810 ns]
thrpt: [1.8671 GiB/s 1.9217 GiB/s 1.9555 GiB/s]
hyperscan/Easy0/32 time: [59.472 ns 60.330 ns 61.681 ns]
thrpt: [494.76 MiB/s 505.85 MiB/s 513.14 MiB/s]
hyperscan/Easy0/1024 time: [100.62 ns 102.37 ns 105.50 ns]
thrpt: [9.0395 GiB/s 9.3161 GiB/s 9.4779 GiB/s]
hyperscan/Easy0/32768 time: [1.1906 us 1.1943 us 1.1987 us]
thrpt: [25.459 GiB/s 25.552 GiB/s 25.632 GiB/s]
hyperscan/Easy0/1048576 time: [36.014 us 36.183 us 36.378 us]
thrpt: [26.845 GiB/s 26.990 GiB/s 27.116 GiB/s]
hyperscan/Easy0/33554432
time: [2.1796 ms 2.2412 ms 2.3535 ms]
thrpt: [13.278 GiB/s 13.943 GiB/s 14.338 GiB/s]
hyperscan/Easy1/16 time: [7.5647 ns 7.5955 ns 7.6288 ns]
thrpt: [1.9533 GiB/s 1.9618 GiB/s 1.9698 GiB/s]
hyperscan/Easy1/32 time: [78.666 ns 79.158 ns 79.806 ns]
thrpt: [382.40 MiB/s 385.53 MiB/s 387.94 MiB/s]
hyperscan/Easy1/1024 time: [139.51 ns 139.88 ns 140.28 ns]
thrpt: [6.7982 GiB/s 6.8179 GiB/s 6.8359 GiB/s]
hyperscan/Easy1/32768 time: [1.7645 us 1.8316 us 1.9747 us]
thrpt: [15.454 GiB/s 16.661 GiB/s 17.296 GiB/s]
hyperscan/Easy1/1048576 time: [54.465 us 54.690 us 54.957 us]
thrpt: [17.770 GiB/s 17.856 GiB/s 17.930 GiB/s]
hyperscan/Easy1/33554432
time: [2.5835 ms 2.5940 ms 2.6056 ms]
thrpt: [11.994 GiB/s 12.047 GiB/s 12.096 GiB/s]
hyperscan/Easy0i/16 time: [7.5739 ns 7.5992 ns 7.6260 ns]
thrpt: [1.9540 GiB/s 1.9609 GiB/s 1.9674 GiB/s]
hyperscan/Easy0i/32 time: [63.005 ns 64.754 ns 67.552 ns]
thrpt: [451.77 MiB/s 471.28 MiB/s 484.37 MiB/s]
hyperscan/Easy0i/1024 time: [114.28 ns 116.08 ns 118.43 ns]
thrpt: [8.0526 GiB/s 8.2157 GiB/s 8.3450 GiB/s]
hyperscan/Easy0i/32768 time: [1.3169 us 1.3209 us 1.3256 us]
thrpt: [23.022 GiB/s 23.103 GiB/s 23.174 GiB/s]
hyperscan/Easy0i/1048576
time: [40.118 us 40.363 us 40.628 us]
thrpt: [24.037 GiB/s 24.194 GiB/s 24.342 GiB/s]
hyperscan/Easy0i/33554432
time: [2.2489 ms 2.2769 ms 2.3150 ms]
thrpt: [13.499 GiB/s 13.725 GiB/s 13.896 GiB/s]
hyperscan/Hard1/16 time: [38.670 ns 40.512 ns 43.454 ns]
thrpt: [351.15 MiB/s 376.65 MiB/s 394.59 MiB/s]
hyperscan/Hard1/32 time: [40.461 ns 40.692 ns 40.890 ns]
thrpt: [746.32 MiB/s 749.97 MiB/s 754.24 MiB/s]
hyperscan/Hard1/1024 time: [68.407 ns 68.681 ns 69.002 ns]
thrpt: [13.821 GiB/s 13.886 GiB/s 13.941 GiB/s]
hyperscan/Hard1/32768 time: [1.0370 us 1.0411 us 1.0455 us]
thrpt: [29.190 GiB/s 29.314 GiB/s 29.429 GiB/s]
hyperscan/Hard1/1048576 time: [34.754 us 35.279 us 36.310 us]
thrpt: [26.895 GiB/s 27.681 GiB/s 28.099 GiB/s]
hyperscan/Hard1/33554432
time: [2.3153 ms 2.3878 ms 2.4730 ms]
thrpt: [12.637 GiB/s 13.087 GiB/s 13.497 GiB/s]
chimera/Hard/16 time: [21.959 ns 22.034 ns 22.121 ns]
thrpt: [689.79 MiB/s 692.50 MiB/s 694.87 MiB/s]
chimera/Hard/32 time: [45.730 ns 45.872 ns 46.046 ns]
thrpt: [662.76 MiB/s 665.28 MiB/s 667.34 MiB/s]
chimera/Hard/1024 time: [116.41 ns 121.81 ns 132.81 ns]
thrpt: [7.1810 GiB/s 7.8292 GiB/s 8.1923 GiB/s]
chimera/Hard/32768 time: [1.2021 us 1.2064 us 1.2124 us]
thrpt: [25.171 GiB/s 25.296 GiB/s 25.387 GiB/s]
chimera/Hard/1048576 time: [35.805 us 35.916 us 36.050 us]
thrpt: [27.089 GiB/s 27.190 GiB/s 27.274 GiB/s]
chimera/Hard/33554432 time: [2.2012 ms 2.2206 ms 2.2414 ms]
thrpt: [13.942 GiB/s 14.073 GiB/s 14.197 GiB/s]
chimera/Medium/16 time: [21.899 ns 22.288 ns 23.108 ns]
thrpt: [660.33 MiB/s 684.63 MiB/s 696.77 MiB/s]
chimera/Medium/32 time: [49.151 ns 49.310 ns 49.487 ns]
thrpt: [616.68 MiB/s 618.89 MiB/s 620.89 MiB/s]
chimera/Medium/1024 time: [128.00 ns 128.48 ns 129.00 ns]
thrpt: [7.3926 GiB/s 7.4229 GiB/s 7.4505 GiB/s]
chimera/Medium/32768 time: [1.2143 us 1.2178 us 1.2219 us]
thrpt: [24.976 GiB/s 25.059 GiB/s 25.132 GiB/s]
chimera/Medium/1048576 time: [35.888 us 36.224 us 36.839 us]
thrpt: [26.509 GiB/s 26.959 GiB/s 27.212 GiB/s]
chimera/Medium/33554432 time: [2.2379 ms 2.2767 ms 2.3200 ms]
thrpt: [13.470 GiB/s 13.726 GiB/s 13.964 GiB/s]
chimera/Easy0/16 time: [22.006 ns 22.074 ns 22.145 ns]
thrpt: [689.05 MiB/s 691.25 MiB/s 693.40 MiB/s]
chimera/Easy0/32 time: [45.707 ns 46.510 ns 47.881 ns]
thrpt: [637.37 MiB/s 656.15 MiB/s 667.68 MiB/s]
chimera/Easy0/1024 time: [116.18 ns 118.38 ns 121.96 ns]
thrpt: [7.8196 GiB/s 8.0559 GiB/s 8.2085 GiB/s]
chimera/Easy0/32768 time: [1.2089 us 1.2149 us 1.2222 us]
thrpt: [24.970 GiB/s 25.120 GiB/s 25.245 GiB/s]
chimera/Easy0/1048576 time: [35.747 us 35.839 us 35.949 us]
thrpt: [27.165 GiB/s 27.248 GiB/s 27.318 GiB/s]
chimera/Easy0/33554432 time: [2.1717 ms 2.1862 ms 2.2021 ms]
thrpt: [14.191 GiB/s 14.294 GiB/s 14.389 GiB/s]
chimera/Easy1/16 time: [21.898 ns 21.957 ns 22.019 ns]
thrpt: [692.99 MiB/s 694.95 MiB/s 696.80 MiB/s]
chimera/Easy1/32 time: [54.226 ns 58.548 ns 63.641 ns]
thrpt: [479.53 MiB/s 521.24 MiB/s 562.78 MiB/s]
chimera/Easy1/1024 time: [156.46 ns 156.94 ns 157.50 ns]
thrpt: [6.0551 GiB/s 6.0769 GiB/s 6.0954 GiB/s]
chimera/Easy1/32768 time: [1.9545 us 2.4792 us 3.5112 us]
thrpt: [8.6914 GiB/s 12.310 GiB/s 15.614 GiB/s]
chimera/Easy1/1048576 time: [61.846 us 80.517 us 107.45 us]
thrpt: [9.0885 GiB/s 12.129 GiB/s 15.790 GiB/s]
chimera/Easy1/33554432 time: [3.5006 ms 3.8279 ms 4.2312 ms]
thrpt: [7.3857 GiB/s 8.1637 GiB/s 8.9271 GiB/s]
chimera/Easy0i/16 time: [32.880 ns 37.164 ns 42.523 ns]
thrpt: [358.83 MiB/s 410.58 MiB/s 464.07 MiB/s]
chimera/Easy0i/32 time: [56.519 ns 60.221 ns 64.878 ns]
thrpt: [470.38 MiB/s 506.76 MiB/s 539.95 MiB/s]
chimera/Easy0i/1024 time: [131.94 ns 132.87 ns 133.96 ns]
thrpt: [7.1191 GiB/s 7.1775 GiB/s 7.2283 GiB/s]
chimera/Easy0i/32768 time: [1.6345 us 1.7481 us 1.9095 us]
thrpt: [15.982 GiB/s 17.457 GiB/s 18.671 GiB/s]
chimera/Easy0i/1048576 time: [45.820 us 47.466 us 49.573 us]
thrpt: [19.699 GiB/s 20.574 GiB/s 21.313 GiB/s]
chimera/Easy0i/33554432 time: [2.6227 ms 2.6561 ms 2.6926 ms]
thrpt: [11.606 GiB/s 11.765 GiB/s 11.915 GiB/s]
chimera/Hard1/16 time: [62.354 ns 63.248 ns 64.400 ns]
thrpt: [236.94 MiB/s 241.25 MiB/s 244.71 MiB/s]
chimera/Hard1/32 time: [62.204 ns 64.787 ns 68.518 ns]
thrpt: [445.39 MiB/s 471.05 MiB/s 490.60 MiB/s]
chimera/Hard1/1024 time: [100.28 ns 107.50 ns 119.79 ns]
thrpt: [7.9612 GiB/s 8.8711 GiB/s 9.5099 GiB/s]
chimera/Hard1/32768 time: [1.1941 us 1.2423 us 1.3333 us]
thrpt: [22.889 GiB/s 24.565 GiB/s 25.557 GiB/s]
chimera/Hard1/1048576 time: [40.332 us 41.404 us 42.798 us]
thrpt: [22.818 GiB/s 23.586 GiB/s 24.213 GiB/s]
chimera/Hard1/33554432 time: [3.0922 ms 3.4004 ms 3.7717 ms]
thrpt: [8.2854 GiB/s 9.1901 GiB/s 10.106 GiB/s]
regex/Hard/16 time: [65.931 ns 67.258 ns 68.895 ns]
thrpt: [221.48 MiB/s 226.87 MiB/s 231.44 MiB/s]
regex/Hard/32 time: [113.10 ns 121.71 ns 132.71 ns]
thrpt: [229.95 MiB/s 250.73 MiB/s 269.82 MiB/s]
regex/Hard/1024 time: [2.1262 us 2.1694 us 2.2317 us]
thrpt: [437.58 MiB/s 450.16 MiB/s 459.30 MiB/s]
regex/Hard/32768 time: [68.672 us 74.051 us 81.856 us]
thrpt: [381.77 MiB/s 422.01 MiB/s 455.06 MiB/s]
regex/Hard/1048576 time: [2.1402 ms 2.1658 ms 2.1936 ms]
thrpt: [455.86 MiB/s 461.73 MiB/s 467.24 MiB/s]
Benchmarking regex/Hard/33554432: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, or reduce sample count to 70.
regex/Hard/33554432 time: [69.916 ms 72.581 ms 76.022 ms]
thrpt: [420.93 MiB/s 440.88 MiB/s 457.69 MiB/s]
regex/Medium/16 time: [39.695 ns 40.709 ns 41.880 ns]
thrpt: [364.34 MiB/s 374.82 MiB/s 384.40 MiB/s]
regex/Medium/32 time: [51.597 ns 51.816 ns 52.057 ns]
thrpt: [586.23 MiB/s 588.96 MiB/s 591.45 MiB/s]
regex/Medium/1024 time: [158.24 ns 162.78 ns 169.19 ns]
thrpt: [5.6365 GiB/s 5.8587 GiB/s 6.0269 GiB/s]
regex/Medium/32768 time: [3.6467 us 3.7579 us 3.9074 us]
thrpt: [7.8102 GiB/s 8.1210 GiB/s 8.3685 GiB/s]
regex/Medium/1048576 time: [125.20 us 134.81 us 150.08 us]
thrpt: [6.5072 GiB/s 7.2439 GiB/s 7.7998 GiB/s]
regex/Medium/33554432 time: [4.6222 ms 4.8445 ms 5.1140 ms]
thrpt: [6.1106 GiB/s 6.4507 GiB/s 6.7609 GiB/s]
regex/Easy0/16 time: [34.249 ns 34.529 ns 34.823 ns]
thrpt: [438.18 MiB/s 441.91 MiB/s 445.52 MiB/s]
regex/Easy0/32 time: [45.272 ns 45.964 ns 46.787 ns]
thrpt: [652.26 MiB/s 663.95 MiB/s 674.09 MiB/s]
regex/Easy0/1024 time: [66.692 ns 67.038 ns 67.416 ns]
thrpt: [14.146 GiB/s 14.226 GiB/s 14.300 GiB/s]
regex/Easy0/32768 time: [1.1983 us 1.2147 us 1.2343 us]
thrpt: [24.725 GiB/s 25.123 GiB/s 25.467 GiB/s]
regex/Easy0/1048576 time: [43.438 us 44.470 us 45.709 us]
thrpt: [21.365 GiB/s 21.960 GiB/s 22.482 GiB/s]
regex/Easy0/33554432 time: [2.4127 ms 2.4772 ms 2.5653 ms]
thrpt: [12.182 GiB/s 12.615 GiB/s 12.952 GiB/s]
regex/Easy1/16 time: [50.337 ns 50.917 ns 51.635 ns]
thrpt: [295.51 MiB/s 299.68 MiB/s 303.14 MiB/s]
regex/Easy1/32 time: [50.931 ns 52.127 ns 53.770 ns]
thrpt: [567.56 MiB/s 585.45 MiB/s 599.20 MiB/s]
regex/Easy1/1024 time: [66.875 ns 72.588 ns 79.624 ns]
thrpt: [11.977 GiB/s 13.138 GiB/s 14.261 GiB/s]
regex/Easy1/32768 time: [448.08 ns 489.86 ns 551.24 ns]
thrpt: [55.361 GiB/s 62.299 GiB/s 68.108 GiB/s]
regex/Easy1/1048576 time: [25.172 us 28.716 us 33.628 us]
thrpt: [29.040 GiB/s 34.008 GiB/s 38.796 GiB/s]
regex/Easy1/33554432 time: [2.3208 ms 2.4041 ms 2.4967 ms]
thrpt: [12.517 GiB/s 12.999 GiB/s 13.465 GiB/s]
regex/Easy0i/16 time: [59.379 ns 60.330 ns 61.471 ns]
thrpt: [248.23 MiB/s 252.92 MiB/s 256.97 MiB/s]
regex/Easy0i/32 time: [86.053 ns 91.097 ns 98.327 ns]
thrpt: [310.37 MiB/s 335.00 MiB/s 354.64 MiB/s]
regex/Easy0i/1024 time: [151.25 ns 152.59 ns 154.10 ns]
thrpt: [6.1888 GiB/s 6.2499 GiB/s 6.3052 GiB/s]
regex/Easy0i/32768 time: [3.6745 us 3.7282 us 3.7843 us]
thrpt: [8.0642 GiB/s 8.1857 GiB/s 8.3052 GiB/s]
regex/Easy0i/1048576 time: [134.03 us 142.00 us 151.84 us]
thrpt: [6.4317 GiB/s 6.8774 GiB/s 7.2859 GiB/s]
regex/Easy0i/33554432 time: [4.9993 ms 5.7185 ms 6.6768 ms]
thrpt: [4.6804 GiB/s 5.4647 GiB/s 6.2509 GiB/s]
regex/Hard1/16 time: [56.829 ns 62.347 ns 71.392 ns]
thrpt: [213.73 MiB/s 244.74 MiB/s 268.51 MiB/s]
regex/Hard1/32 time: [99.740 ns 109.86 ns 122.72 ns]
thrpt: [248.67 MiB/s 277.78 MiB/s 305.97 MiB/s]
regex/Hard1/1024 time: [155.81 ns 166.22 ns 181.80 ns]
thrpt: [5.2458 GiB/s 5.7375 GiB/s 6.1209 GiB/s]
regex/Hard1/32768 time: [4.2781 us 4.5519 us 4.9237 us]
thrpt: [6.1981 GiB/s 6.7044 GiB/s 7.1334 GiB/s]
regex/Hard1/1048576 time: [199.14 us 242.67 us 293.85 us]
thrpt: [3.3233 GiB/s 4.0243 GiB/s 4.9040 GiB/s]
regex/Hard1/33554432 time: [6.0598 ms 6.6442 ms 7.3072 ms]
thrpt: [4.2766 GiB/s 4.7034 GiB/s 5.1569 GiB/s]
许可证
此项目可选择Apache许可证APACHE-2.0或MIT许可证MIT。
贡献
除非您明确声明,否则根据Apache-2.0许可证定义的,您有意提交给Futures的任何贡献都将按上述方式双许可,不附加任何额外条款或条件。
依赖项
~190KB