17 个稳定版本

1.2.4 2022年9月5日
1.1.1 2022年9月4日
1.0.9 2022年9月4日
1.0.2 2022年9月3日
0.1.0 2022年9月3日

#333 in 算法

每月45次下载

MIT 协议

70KB
1K SLoC

Fluent Data

低内存占用流数据建模库和服务。

该算法从输入流读取数据点,拟合模型并将更新的模型发送到输出流。

在线算法拟合一个由一组球组成的模型。每个球由其中心、半径和权重(包含在球中的点的衰减数量)描述。

安装和运行程序

安装

cargo install fluent_data

运行程序,并在标准输入中输入数据点。程序将以模型回答

fluent_data
[5,-1]
[{"center":[5.0,-1.0],"radius":null,"weight":0.0}]
[1,1]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":1.0}]
[15,-13]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.95},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":1.0}]
[11,23]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.9025},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":0.95},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":1.0}]
[31,-3]    
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.8573749999999999},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":0.9025},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.95},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":1.0}]
[10,-9]    
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.8145062499999999},{"center":[14.032194480946124,-12.557818659658345],"radius":8.65915237435586,"weight":1.9024999999999999},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.9025},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.95}]
[6,-4]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.7737809374999999},{"center":[11.264857881136951,-9.609388458225668],"radius":9.828917387831996,"weight":2.9025},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.8573749999999999},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.9025}]
[-2,-5]
[{"center":[6.7297134962820016,-6.8681649994430005],"radius":15.539441192890935,"weight":4.6762809375},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.8145062499999999},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.8573749999999999}]

模型表示为包含每个球的对象的 json 数组

  • center 是球心,
  • radius 是球的半径,
  • weight 是球的权重(通过将权重除以权重的总和获得概率)。

作为服务运行

程序可以作为 websocket 服务器运行

fluent_data --service

数据点发送到 ws://0.0.0.0:9001/ws/points,模型从 ws://0.0.0.0:9001/ws/models 接收。端口号可以通过设置 PORT 环境变量进行自定义。

对于发送和接收点,可以使用 websocket 客户端 websocat。打开一个终端,用于监听模型

websocat ws://127.0.0.1:9001/ws/models

打开另一个终端并输入一些点

websocat ws://127.0.0.1:9001/ws/points
[5,-1]
[1,1]
[15,-13]
[11,23]
[31,-3]    
[10,-9]    
[6,-4]
[-2,-5]

第一个终端应显示模型

[{"center":[5.0,-1.0],"radius":null,"weight":0.0}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":1.0}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.95},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":1.0}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.9025},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":0.95},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":1.0}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.8573749999999999},{"center":[18.5,-16.5],"radius":3.9597979746446663,"weight":0.9025},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.95},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":1.0}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.8145062499999999},{"center":[14.032194480946124,-12.557818659658345],"radius":8.65915237435586,"weight":1.9024999999999999},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.9025},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.95}]
[{"center":[1.0,1.0],"radius":4.47213595499958,"weight":0.7737809374999999},{"center":[11.264857881136951,-9.609388458225668],"radius":9.828917387831996,"weight":2.9025},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.8573749999999999},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.9025}]
[{"center":[6.7297134962820016,-6.8681649994430005],"radius":15.539441192890935,"weight":4.6762809375},{"center":[13.5,28.5],"radius":4.833218389437829,"weight":0.8145062499999999},{"center":[34.125,0.375],"radius":3.6796738985948196,"weight":0.8573749999999999}]

使用库

请参阅 包文档

自定义算法

请参阅 包文档中的自定义部分

工作原理

给定一组从迄今为止接收到的数据点拟合的球,设 P 为新的进入点。设 B 为最有可能包含 P 的球 (*)。

B 的中心为 C,半径为 r,重量为 w。设 dPC 的距离:d = |PC|

(*) “最有可能包含”的意思是,对于模型中的所有球,B 最小化 d/r 的数量。

拟合算法如下

  • (I) 如果距离小于四个 B 半径,即 d < 4r
    • 新点属于 BB 将增量更新
      • 半径的平方设置为平均平方半径:<- (w.+) / (w + 1)
      • 中心设置为平均中心:C <- (w.C + P) / (w + 1)
      • 重量增加 1:w <- w + 1
    • 否则创建一个新的球 B*
      • 将半径设置为距离的五分之一:r* <- d / 5,即 r*² <-/ 25
      • 将中心 C* 设置为使 CPC* 对齐,即 CC* = 6/5 CP,即 C* <- (-C + 5P) / 6
      • 球的重量设置为 1。
  • (II) 设 B'B 的最近球,即所有球中距离 CC' 最小。在上面的第一种情况下,如果它们之间的平方距离小于它们半径平方的和,即 <+ r'²,则 BB' 可以合并成一个球。
    • B被更新
      • 半径的平方被设置为半径平方的加权平均值,加上距离的平方 <- (加权.r² + 加权'.r²) / (加权 + 加权') +
      • 中心被设置为中心的加权平均值 C <- (加权.C + 加权'.C) / (加权 + 加权')
      • 权重被设置为权重的总和 w <- w + 加权'
    • B'被丢弃。
  • (III) 除了属于P的球之外的所有球的权重都以0.95的因子衰减,w <- 0.95 w
    • 所有权重低于1/100的球都被移除,w < 1/100

关于实现

每个球B分别由(C, r, w)表示,分别是球的中心、半径的平方(对于计算更有用)和球的权重。

该模型在内存中以图的形式表示,图的顶点与球相关联。对于一个与球B相关联的顶点VV的邻域是{V', V"},即与B的两个最近球的顶点,即距离分别为最小和第二小的球,即距离|CC'|的球。

给定一组从接收到的数据点拟合的球,设P为新的进入点。设{B,}为最可能包含P的两个球(*见上文)和相应的顶点{V,}

内存中维护图的方式如下

  • 在步骤(I)中创建新的球体B*时,会创建一个新的顶点V*,并关联到球体B*{V,}变为V*的邻域。B*现在可能比B更接近,而不是B'B"
    因此,通过在{B', B", B*}中搜索B的2个最近邻居,重新计算了在{V', V", V*}之间的V邻居。
  • 当在步骤(I)中将P包含到B中时,现在可能比B'B"更接近B
    • 如果步骤(II)中没有合并BB',且不属于{B', B"},则在{V', V", V°}之间重新计算V的邻居,通过在{B', B", B°}中搜索B的2个最近邻居。
    • 如果步骤(II)中合并了BB':如果不属于{B', B"},则V的邻域变为{V", V°},否则变为{V"}

图的实现是私有于crate的,其实现可以在以下位置找到:graph.rs

依赖

~8–11MB
~219K SLoC