3个稳定版本

1.0.2	2022年6月15日
1.0.0	2022年6月7日

#7 in #相关性

每月 27次下载

MIT 许可证

470KB
10K SLoC

Series2Graph++

Series2Graph++ (S2G++) 是一种基于 Series2Graph (S2G) 和 DADS 算法的时序异常检测算法。S2G++ 可以处理多元时间序列，而 S2G 和 DADS 只能处理单变量时间序列。此外，S2G++ 从 DADS 中汲取思想，以在计算机集群中分布式运行。S2G++ 使用 Rust 编写，并利用了 actix 和 actix-telepathy 库。

快速开始

要求

Rust 1.58
openblas
(Docker)

为了使 openblas 可用于 Rust 构建过程，在 Debian (Linux) 上执行以下操作

sudo apt install build-essential gfortran libopenblas-base libopenblas-dev gcc

安装

从源码安装

git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
cargo build

Docker

基础镜像 akita/rust-base 必须可用于您的机器。

git pull https://gitlab.hpi.de/akita/s2gpp
cd s2gpp
docker build s2gpp .

用法（二进制文件）

参数

模式

s2gpp --local-host <IP:Port> --pattern-length <Int> --latent <Int> --query-length <Int> --rate <Int> --threads <Int> --cluster-nodes <Int> --score-output-path <Path> [main --data-path <Path> | sub --mainhost <IP:Port>]

S2G++ 期望使用以下两个子命令之一及其特定参数

main（集群中的主计算机）
- data-path（输入时间序列的路径）
sub（集群中的其他计算机；仅在分布式设置中需要）
- mainhost（集群中主计算机的 IP 地址）

在执行这些子命令之前，必须定义通用参数

local-host（绑定监听器的 IP 地址和端口号）
pattern-length（滑动窗口的大小，独立于异常长度，但在最佳情况下应更大。）
latent（潜在嵌入空间的大小。此空间是随后 PCA 计算的输入。）
query-length（用于查找异常（查询子序列）的滑动窗口大小。query-length 必须大于等于 pattern-length!）
rate（用于提取模式节点的角度数。较高的值将导致更高的精度，但代价是计算时间的增加。）
threads（除了主线程之外启动的辅助线程的数量。（min=1））
cluster-nodes（计算机集群的大小。）
score-output-path（分数写入的路径。）
column-start-idx（跳过多少列）
column-end-idx（使用到哪一列（不包括）。也可以使用负数从末尾开始计数）
self-correction（S2G++是否会在交易数量太少时纠正时间嵌入的方向）

输入格式

期望时间序列的输入格式为带有标题的CSV文件。每一列代表时间序列的一个通道。有时，时间序列文件还包括标签和索引。可以使用column-start-idx / column-end-idx范围模式来跳过列。它的行为类似于Python范围。

使用（库）

Cargo.toml

[dependencies]
s2gpp = "1.0.2"

您的Rust应用程序

fn some_fn(timeseries: Array2<f32>) -> Result<Array1<f32>, ()> {
  let params = s2gpp::Parameters::default();
  let anomaly_score = s2gpp::s2gpp(params, Some(timeseries))?.unwrap();
  Ok(anomaly_score)
}

Python

我们将Rust代码封装在一个Python包中，可以在不安装Rust的情况下使用。

安装

PyPI

pip install s2gpp

使用Docker构建

make build-docker
pip install wheels/s2gpp-*.whl

从源码构建

make install

使用方法

单机

from s2gpp import Series2GraphPP
import pandas as pd

ts = pd.read_csv("data/ts_0.csv").values

model = Series2GraphPP(pattern_length=100)
anomaly_scores = model.fit_predict(ts)

分布式

from s2gpp import DistributedSeries2GraphPP
from pathlib import Path

# run on one machine
def main_node():
    dataset_path = Path("data/ts_0.csv")
  
    model = DistributedSeries2GraphPP.main(local_host="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
    model.fit_predict(dataset_path)

# run on other machine
def sub_node():
    model = DistributedSeries2GraphPP.sub(local_host="127.0.0.1:1993", mainhost="127.0.0.1:1992", n_cluster_nodes=2, pattern_length=100)
    model.fit_predict()

引用

请在使用时引用此工作！

@software{Wenig_Series2Graph_2022,
  author = {Wenig, Phillip},
  month = {6},
  title = {{Series2Graph++}},
  version = {1.0.2},
  year = {2022}
}

参考文献

[1] P. Boniol and T. Palpanas, Series2Graph: Graph-based Subsequence Anomaly Detection in Time Series, PVLDB (2020) 链接

[2] Schneider, J., Wenig, P. & Papenbrock, T. Distributed detection of sequential anomalies in univariate time series. The VLDB Journal 30, 579–602 (2021). 链接

依赖项

~94MB
~1.5M SLoC