14 个版本 (8 个破坏性更新)

0.9.0	2024 年 7 月 5 日
0.7.1	2024 年 6 月 30 日
0.2.0	2024 年 3 月 2 日

#310 在解析器实现

每月 88 次下载

MIT/Apache

235KB
5K SLoC

cloudfront-logs

基于 Rust 的 AWS CloudFront 日志行解析器

日志格式

这里描述了 AWS CloudFront 日志文件的格式

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html#LogFileFormat

目的和设计

此解析器目前仅关注解析日志文件的单独一行。它为那些制表符分隔的字段项提供结构化视图，并避免使用数字索引。

使用此库的用户负责将单独的行传递给解析器。这使得它在不同的场景中使用非常灵活，因为没有对日志行从何而来以及它们如何在程序中传递的任何假设。

将来可能会添加更多实用工具，但截至目前，最重要的是提供快速可靠的解析功能。

因此，该库提供不同的解析器实现，以便您可以为您自己的用例和需求选择合适的选项。

请参考基准测试（运行 ./benches.sh）以获取综合概述。

示例

给定以下日志行

2019-12-04	21:02:31	LAX1	392	192.0.2.100	GET	d111111abcdef8.cloudfront.net	/index.html	200	-	Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36	-	-	Hit	SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ==	d111111abcdef8.cloudfront.net	https	23	0.001	-	TLSv1.2	ECDHE-RSA-AES128-GCM-SHA256	Hit	HTTP/2.0	-	-	11040	0.001	Hit	text/html	78	-	-

您有几种方式处理此行

use cloudfront_logs::*;

let logline: &str = "2019-12-04	21:02:31	LAX1	392	192.0.2.100	GET	d111111abcdef8.cloudfront.net	/index.html	200	-	Mozilla/5.0%20(Windows%20NT%2010.0;%20Win64;%20x64)%20AppleWebKit/537.36%20(KHTML,%20like%20Gecko)%20Chrome/78.0.3904.108%20Safari/537.36	-	-	Hit	SOX4xwn4XV6Q4rgb7XiVGOHms_BGlTAC4KyHmureZmBNrjGdRLiNIQ==	d111111abcdef8.cloudfront.net	https	23	0.001	-	TLSv1.2	ECDHE-RSA-AES128-GCM-SHA256	Hit	HTTP/2.0	-	-	11040	0.001	Hit	text/html	78	-	-";

// -- borrowing the input --

// reasonable default parser
let item = ValidatedRawLogline::try_from(logline).unwrap();

// fields are only sub-slices from the input and therefore all return &str
assert_eq!(item.date, "2019-12-04");
assert_eq!(item.sc_bytes, "392");
assert_eq!(item.c_ip, "192.0.2.100");

// -- get an owned version --

// parser which only uses types accessible without external dependencies,
// only Rust's core and std library is allowed
let item = ValidatedSimpleLogline::try_from(logline).unwrap();

assert_eq!(item.date, "2019-12-04");
assert_eq!(item.sc_content_len, 78);
assert_eq!(item.c_ip, IpAddr::V4(Ipv4Addr::new(192, 0, 2, 100)));

// -- get an owned and typed version --

// parser which also converts some fields via external dependencies,
let item = ValidatedTimeLogline::try_from(logline).unwrap();

// here: date and time from the `time` crate
assert_eq!(item.date, time_macros::date!(2019-12-04));
assert_eq!(item.time, time_macros::time!(21:02:31));
assert_eq!(item.time_taken, Duration::from_millis(1));

基准测试示例

以下测试是在 WSL Ubuntu，在 AMD Ryzen 9 7950X3D 16 核处理器，64 GiB RAM 的机器上运行的。

您的数字可能会有所不同。更重要的是，解析器实现的相对差异。

# code under benches/comparison-real-world.rs
RUSTFLAGS="-Ctarget-cpu=native" cargo bench -q --all-features --bench real-world

*** Comparing different parsers for AWS CloudFront logs ***

Parses lines and extracts a few fields, slightly unordered,
this should simulate close to real-world usages.
Timer precision: 10 ns
real_world                   fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ 00 CheckedRawLogLine                    │               │               │               │         │
│  ├─ Line A                 162.3 ns      │ 210.6 ns      │ 167.1 ns      │ 167.8 ns      │ 1000    │ 1000000
│  ├─ Line B                 164.2 ns      │ 275.6 ns      │ 171.8 ns      │ 175.8 ns      │ 1000    │ 1000000
│  ├─ Lines A+B              325.1 ns      │ 398.2 ns      │ 337.4 ns      │ 337.5 ns      │ 1000    │ 1000000
│  ╰─ Sample File            994 ns        │ 1.1 µs        │ 1.024 µs      │ 1.029 µs      │ 1000    │ 1000000
├─ 10 CheckedRawLogLineView                │               │               │               │         │
│  ├─ Line A                 366.6 ns      │ 422.9 ns      │ 376.8 ns      │ 378 ns        │ 1000    │ 1000000
│  ├─ Line B                 358.8 ns      │ 412.5 ns      │ 369 ns        │ 370 ns        │ 1000    │ 1000000
│  ├─ Lines A+B              716.5 ns      │ 888.4 ns      │ 748.2 ns      │ 749.7 ns      │ 1000    │ 1000000
│  ╰─ Sample File            2.178 µs      │ 2.784 µs      │ 2.279 µs      │ 2.279 µs      │ 1000    │ 1000000
├─ 11 SmartRawLogLineView                  │               │               │               │         │
│  ├─ Line A                 287.5 ns      │ 385 ns        │ 298.9 ns      │ 301.3 ns      │ 1000    │ 1000000
│  ├─ Line B                 285.2 ns      │ 401.5 ns      │ 301.5 ns      │ 303 ns        │ 1000    │ 1000000
│  ├─ Lines A+B              556.7 ns      │ 680.3 ns      │ 594.7 ns      │ 595.8 ns      │ 1000    │ 1000000
│  ╰─ Sample File            1.694 µs      │ 2.671 µs      │ 1.789 µs      │ 1.796 µs      │ 1000    │ 1000000
├─ 20 SimpleLogLine                        │               │               │               │         │
│  ├─ Line A                 355.2 ns      │ 432 ns        │ 370.8 ns      │ 372.9 ns      │ 1000    │ 1000000
│  ├─ Line B                 347.8 ns      │ 533.7 ns      │ 370.7 ns      │ 373.5 ns      │ 1000    │ 1000000
│  ├─ Lines A+B              715.5 ns      │ 883.4 ns      │ 752 ns        │ 753.9 ns      │ 1000    │ 1000000
│  ╰─ Sample File            2.136 µs      │ 3.085 µs      │ 2.236 µs      │ 2.247 µs      │ 1000    │ 1000000
╰─ 21 TypedLogLine                         │               │               │               │         │
   ├─ Line A                 395.5 ns      │ 467.9 ns      │ 407.9 ns      │ 409.6 ns      │ 1000    │ 1000000
   ├─ Line B                 387.8 ns      │ 512.7 ns      │ 397.1 ns      │ 399.5 ns      │ 1000    │ 1000000
   ├─ Lines A+B              781 ns        │ 1.164 µs      │ 812.2 ns      │ 813.6 ns      │ 1000    │ 1000000
   ╰─ Sample File            2.317 µs      │ 3.551 µs      │ 2.384 µs      │ 2.409 µs      │ 1000    │ 1000000

还有更多的基准测试可以运行，如 single-field 和 two-fields，这些应该突出显示“视图”解析器的优势。

安全性

此 crate 使用 #![forbid(unsafe_code)] 确保一切都在 100% 安全 Rust 中实现。

许可

^{根据您的选择，受 Apache 许可证第 2.0 版或 MIT 许可证的许可。}
_{除非您明确声明，否则根据 Apache-2.0 许可证定义，您提交的任何有意包含在此 crate 中的贡献，都将双重许可，没有任何附加条款或条件。}

依赖关系

~17–24MB
~500K SLoC