4个版本 (2个重大更改)

0.3.1 2023年1月13日
0.3.0 2022年12月12日
0.2.0 2022年5月25日
0.1.0 2021年8月26日

#1082 in 编码

Download history 3027/week @ 2024-03-14 3772/week @ 2024-03-21 4194/week @ 2024-03-28 4942/week @ 2024-04-04 4502/week @ 2024-04-11 3000/week @ 2024-04-18 2745/week @ 2024-04-25 3675/week @ 2024-05-02 3939/week @ 2024-05-09 4980/week @ 2024-05-16 3300/week @ 2024-05-23 3892/week @ 2024-05-30 3059/week @ 2024-06-06 2007/week @ 2024-06-13 2814/week @ 2024-06-20 1577/week @ 2024-06-27

10,344 个月下载量
64 个crate中使用 (通过 tantivy-nightly)

MIT 许可证

280KB
6.5K SLoC

快速字段编解码器

本crate包含各种快速字段编解码器,用于在Tantivy中压缩/解压缩快速字段数据。

贡献

贡献相当直接。由于位打包是最简单的压缩器,您可以检查它作为参考。

编解码器需要实现2个特质

  • 实现 FastFieldCodecReader 的读取器来读取编解码器。
  • 实现 FastFieldCodecSerializer 的序列化器,用于压缩估计和编解码器名称+ID。

测试

一旦实现了特质,测试和基准测试集成就相当简单(参见 test_with_codec_data_setsbench.rs)。

确保将编解码器添加到main.rs中,该文件测试了不同数据集的压缩比和估计。您可以通过以下方式运行它

cargo run --features bin

待办事项

  • 添加实际数据集进行比较
  • 添加编解码器以涵盖稀疏数据集

编解码器比较

+----------------------------------+-------------------+------------------------+
|                                  | Compression Ratio | Compression Estimation |
+----------------------------------+-------------------+------------------------+
| Autoincrement                    |                   |                        |
+----------------------------------+-------------------+------------------------+
| LinearInterpol                   | 0.000039572664    | 0.000004396963         |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol              | 0.1477348         | 0.17275847             |
+----------------------------------+-------------------+------------------------+
| Bitpacked                        | 0.28126493        | 0.28125                |
+----------------------------------+-------------------+------------------------+
| Monotonically increasing concave |                   |                        |
+----------------------------------+-------------------+------------------------+
| LinearInterpol                   | 0.25003937        | 0.26562938             |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol              | 0.190665          | 0.1883836              |
+----------------------------------+-------------------+------------------------+
| Bitpacked                        | 0.31251436        | 0.3125                 |
+----------------------------------+-------------------+------------------------+
| Monotonically increasing convex  |                   |                        |
+----------------------------------+-------------------+------------------------+
| LinearInterpol                   | 0.25003937        | 0.28125438             |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol              | 0.18676           | 0.2040086              |
+----------------------------------+-------------------+------------------------+
| Bitpacked                        | 0.31251436        | 0.3125                 |
+----------------------------------+-------------------+------------------------+
| Almost monotonically increasing  |                   |                        |
+----------------------------------+-------------------+------------------------+
| LinearInterpol                   | 0.14066513        | 0.1562544              |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol              | 0.16335973        | 0.17275847             |
+----------------------------------+-------------------+------------------------+
| Bitpacked                        | 0.28126493        | 0.28125                |
+----------------------------------+-------------------+------------------------+

依赖项

~0.5–2.7MB
~46K SLoC