1 个不稳定版本

0.1.0 2024年6月6日

#276数据库实现

Apache-2.0

89KB
1.5K SLoC

materialized-view

crates.io docs

materialized-view 是一个数据库无关的增量计算引擎,其 重点 在图数据模型,如 RDF 和属性图。由于其是增量系统,查询视图更新的延迟与更新的大小成比例。

还有两个项目在意图上相似,尽管它们更为通用,因为它们支持 SQL,并且部分基于相同的底层理论

  1. materialize
  2. feldera

materialized-view 相比,它们的主要区别是

  1. 关注图形数据
  2. 查询语言是 Datalog 而不是 SQL
  3. 创建查询不需要编译
  4. 查询可以随时更改 - 这包括添加查询查询的新查询
  5. materialized-view 是增量 lambda 演算的一个变体的更高阶解释器

引用 materialized-view

@misc{materialized-view-github,
author = {Bruno Rucy Carneiro Alves de Lima and
Merlin Kramer},
title = {{`materialized-view` System Source Code}},
howpublished = {\url{https://github.com/brurucy/materialized-view}},
month        = June,
year         = 2024
}

它是一个数据库吗?

不是。然而,它提供了对底层查询的 两个 总是更新的视图

  1. 合并 - 物化的始终是最新的状态。
  2. 前沿 - 最新的更新。每次发生 poll 事件后,您可以查询前沿以检索物化的最新更新,并将其存储在您使用的任何数据库中。

它的限制是什么?

materialized-view 的 Datalog 在表达能力上受到很大限制。它与 SQL 等价,没有聚合和否定,但有一个功能强大的声明性递归结构,允许您做更多(并且更高效)的事情,比 WITH RECURSIVE 更好。您将在示例中看到它。

将这个库视为至少是一个维护某些数据密集型工作负载中最昂贵的基本部分的可靠方式。

示例

use materialized_view::*;

type NodeIndex = i32;
type Edge = (NodeIndex, NodeIndex);

// The following recursive query is "equivalent" to this pseudo-SQL statement:
// WITH RECURSIVE reaches(x, y) AS (
//    SELECT x, y FROM edge
//
//    UNION ALL
//
//    SELECT e.x, r.y
//    FROM edge e
//    JOIN reaches r ON e.y = r.x
// )
fn main() { 
 let recursive_query = program! {
  reaches(?x, ?y) <- [edge(?x, ?y)],
  reaches(?x, ?z) <- [edge(?x, ?y), reaches(?y, ?z)]
 };
 let mut dynamic_view = MaterializedDatalogView::new(recursive_query);

 // Add some edges.
 dynamic_view.push_fact("edge", (1, 2));
 dynamic_view.push_fact("edge", (2, 3));
 dynamic_view.push_fact("edge", (3, 4));
 dynamic_view.push_fact("edge", (4, 5));
 dynamic_view.push_fact("edge", (5, 6));

 // Then poll to incrementally update the view
 dynamic_view.poll();

 // Confirm that 6 is reachable from 1
 assert!(dynamic_view.contains("reaches", (1, 6)).unwrap());
 
 // Retract a fact
 dynamic_view.retract_fact("edge", (5, 6));
 dynamic_view.poll();
 
 // Query everything that is reachable from 1
 dynamic_view 
         // The arity of the relation being queried must be specified. e.g to query
         // a relation with two columns, `query_binary` ought to be used. 
         .query_binary::<NodeIndex, NodeIndex>("reaches", (Some(1), ANY_VALUE))
         .unwrap()
         .for_each(|edge| println!("{} is reachable from 1", *edge.1));

 // You are also able to query only the __most recent__ updates.
 dynamic_view
         // The arity of the relation being queried must be specified. e.g to query
         // a relation with two columns, `query_binary` ought to be used.
         .query_frontier_binary::<NodeIndex, NodeIndex>("reaches", (Some(1), ANY_VALUE))
         .unwrap()
         // The second argument is the weight. It represents whether the given value should be added
         // or retracted.
         .for_each(|((from, to), weight)| println!("Diff: {} - Value: ({}, {})", weight, *from, *to));

 // By extending the query with another query, it is possible to incrementally query the incrementally
 // maintained queries
 dynamic_view
         // Queries can also be assembled both a macro a-la program! called rule!:
         // rule! { reachableFromOne(1isize, ?x) <- reaches(1isize, ?x) }
         .push_rule((("reachableFromOne", (Const(1), Var("x"))), vec![("reaches", (Const(1), Var("x")))]));

 dynamic_view.poll();
 dynamic_view
         .query_binary::<NodeIndex, NodeIndex>("reachableFromOne", (Some(1), ANY_VALUE))
         .unwrap()
         .for_each(|edge| println!("{} is reachable from 1", edge.1));

 // And of course, you can retract rules as well!
 assert!(dynamic_view.contains("reachableFromOne", (1, 5)).unwrap());
 dynamic_view
         .retract_rule((("reachableFromOne", (Const(1), Var("x"))), vec![("reaches", (Const(1), Var("x")))]));
 dynamic_view.poll();
 assert!(!dynamic_view.contains("reachableFromOne", (1, 5)).unwrap());
}

依赖项

~19–28MB
~359K SLoC