9个不稳定版本 (3个破坏性更新)

0.4.0	2020年4月6日
0.3.0	2020年4月4日
0.2.0	2020年3月16日
0.1.5	2020年3月16日

#1952 在算法

每月22次下载

MPL-2.0 许可证

180KB
847 行

hotsax

_{HOTSAX分歧发现算法的实现。}

此crate包含HOT SAX算法的实现，以及Keogh等人提出的暴力搜索算法。它还包括Keogh et al.提出的HS-Squeezer算法，因为它提供了有用的优化，同时仍然基于HOT SAX算法。然而，从测试来看，此实现的性能与HOT SAX相似或更差。如果您发现任何可能的优化，请提交一个问题。

在实现过程中，不得不制作了一些其他函数，例如paa、znorm和gaussian。这些函数被公开，因为除了在HOT SAX中使用之外，它们还有其他用途。

代码注释良好，以解释实现，以便人们可以通过查看实现来了解HOT SAX算法的工作原理。如果注释不明确，请随时提交问题。

请注意，仅支持Float向量。如果您的数据由整数组成，您需要将其先转换为浮点数。

使用示例

use std::error::Error;
use plotly::{Plot, Scatter};

// Parses the CSV file from the dataset.
let mut rdr = csv::ReaderBuilder::new()
    .trim(csv::Trim::All)
    .from_path("data/TEK16.CSV")?;

// Deserialize CSV data into a vector of floats.
let mut data : Vec<f64> = Vec::new();
for result in rdr.deserialize() {
    data.push(result?);
}

// Prepare a plot
let mut plot = Plot::new();

// Retrieve the largest discord. This should approx. match the one found in the paper.
// It uses the same settings: a discord size of 256 and a=3.
// word_size was assumed to be 3.
let discord_size = 256;
let discord = hotsax::Anomaly::with(&data, discord_size)
    .use_slice(1000..)      // Skips the beginning due to an abnormality.
    .find_largest_discord() // Finds the largest discord in the subslice.
    .unwrap().1;            // Only gets the location.

// Plot the entire dataset as a blue color.
let trace1 = Scatter::new((1..=data.len()).collect(), data.clone())
    .line(plotly::common::Line::new().color(plotly::NamedColor::Blue))
    .name("Data");

// Plot the discord itself as a red color.
let trace2 = Scatter::new((discord+1..discord+discord_size+1).collect(), data[discord..discord+128].to_vec())
    .line(plotly::common::Line::new().color(plotly::NamedColor::Red))
    .name("Discord");

// Add traces to the plot.
plot.add_trace(trace1);
plot.add_trace(trace2);

// Shows the plot to verify.
plot.show();

评估

为了展示实现的准确性，该算法在论文本身使用的相同数据集上运行。具体来说，使用图6和图7中的数据（如此处所示，或从此存储库的data/目录中的TEK16.CSV和TEK17.CSV分别获取）。

算法使用词大小为3，字母表大小为3，分歧大小为128运行。

以下显示此算法的结果，与论文中显示的图进行比较。

依赖关系

~1MB
~19K SLoC