16 个版本

新 0.2.37-alpha.1	2024 年 8 月 23 日
0.2.36	2024 年 7 月 4 日
0.2.33	2024 年 6 月 5 日
0.2.27	2024 年 5 月 29 日

100 在编程语言中排名

每月下载 527 次
在 2 个 crate 中使用

Apache-2.0

500KB
13K SLoC

Bend

一种高级、大规模并行编程语言

索引

简介
重要说明
安装
入门
加速示例
额外资源

简介

Bend 提供了像 Python 和 Haskell 这样的表达性语言的感受和特性。这包括快速的对象分配、对闭包的完整支持、无限制的递归，甚至还有延续。
Bend 的扩展性与 CUDA 相似，它运行在如 GPU 这样的大规模并行硬件上，基于核心数具有近线性的加速，并且无需显式的并行性注解：无需创建线程、锁、互斥锁或原子操作。
Bend 由 HVM2 运行时提供支持。

重要说明

Bend 设计用于在核心数上实现性能扩展，支持超过 10,000 个并发线程。
当前版本可能具有较低的单核性能。
随着我们代码生成和优化技术的进步，您可以期待性能有实质性提升。
我们仍在努力支持 Windows。可以使用 WSL2 作为替代解决方案。
我们目前仅支持 NVIDIA GPU.

安装

安装依赖项

在 Linux 上

# Install Rust if you haven't already.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# For the C version of Bend, use GCC. We recommend a version up to 12.x.
sudo apt install gcc

对于 CUDA 运行时安装 Linux 的 CUDA 工具包版本 12.x。

在 Mac 上

# Install Rust if you haven't it already.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# For the C version of Bend, use GCC. We recommend a version up to 12.x.
brew install gcc

安装 Bend

通过运行以下命令安装 HVM2：

# HVM2 is HOC's massively parallel Interaction Combinator evaluator.
cargo install hvm

# This ensures HVM is correctly installed and accessible.
hvm --version

通过运行以下命令安装 Bend：

# This command will install Bend
cargo install bend-lang

# This ensures Bend is correctly installed and accessible.
bend --version

入门

运行 Bend 程序

bend run    <file.bend> # uses the C interpreter by default (parallel)
bend run-rs <file.bend> # uses the Rust interpreter (sequential)
bend run-c  <file.bend> # uses the C interpreter (parallel)
bend run-cu <file.bend> # uses the CUDA interpreter (massively parallel)

# Notes
# You can also compile Bend to standalone C/CUDA files using gen-c and gen-cu for maximum performance.
# The code generator is still in its early stages and not as mature as compilers like GCC and GHC.
# You can use the -s flag to have more information on
  # Reductions
  # Time the code took to run
  # Interaction per second (In millions)

测试 Bend 程序

以下示例展示了如何对从 start 到 target 范围内的所有数字进行求和。它可以采用两种不同的方法编写：一种是固有的顺序性（因此不能并行化），另一种很容易并行化。（为了可见性，我们将在大多数示例中使用 -s 标志）

顺序版本

首先，创建一个名为 sequential_sum.bend 的文件

# Write this command on your terminal
touch sequential_sum.bend

然后，使用您的文本编辑器打开文件 sequential_sum.bend，将下面的代码复制并粘贴到文件中。

# Defines the function Sum with two parameters: start and target
def Sum(start, target):
  if start == target:
    # If the value of start is the same as target, returns start.
    return start
  else:
    # If start is not equal to target, recursively call Sum with
    # start incremented by 1, and add the result to start.
    return start + Sum(start + 1, target)  

def main():
  # This translates to (1 + (2 + (3 + (...... + (999999 + 1000000)))))
  # Note that this will overflow the maximum value of a number in Bend
  return Sum(1, 1_000_000)

运行文件

您可以使用 Rust 解释器（顺序执行）运行它

bend run sequential_sum.bend -s

或者，您可以使用 C 解释器（顺序执行）运行它

bend run-c sequential_sum.bend -s

如果您有 NVIDIA GPU，也可以在 CUDA（顺序执行）下运行

bend run-cu sequential_sum.bend -s

在本版本中，下一个要计算的价值取决于前一个总和，这意味着它不能在没有完成当前计算的情况下进行。现在，让我们来看看易于并行化的版本。

并行化版本

首先关闭旧文件，然后转到您的终端创建 parallel_sum.bend

# Write this command on your terminal
touch parallel_sum.bend

然后使用您的文本编辑器打开文件 parallel_sum.bend，复制下面的代码并粘贴到文件中。

# Defines the function Sum with two parameters: start and target
def Sum(start, target):
  if start == target:
    # If the value of start is the same as target, returns start.
    return start
  else:
    # If start is not equal to target, calculate the midpoint (half),
    # then recursively call Sum on both halves.
    half = (start + target) / 2
    left = Sum(start, half)  # (Start -> Half)
    right = Sum(half + 1, target)
    return left + right

# A parallelizable sum of numbers from 1 to 1000000
def main():
  # This translates to (((1 + 2) + (3 + 4)) + ... (999999 + 1000000)...)
  return Sum(1, 1_000_000)

在这个例子中，(3 + 4) 的总和不依赖于 (1 + 2)，这意味着它可以并行运行，因为这两个计算可以同时发生。

运行文件

您可以使用 Rust 解释器（顺序执行）运行它

bend run parallel_sum.bend -s

或者您可以使用 C 解释器（并行）来运行它

bend run-c parallel_sum.bend -s

如果您有 NVIDIA GPU，您还可以在 CUDA（大规模并行）中运行

bend run-cu parallel_sum.bend -s

在 Bend 中，只需更改运行命令即可并行化。如果您的代码可以并行运行，它就会并行运行。

加速示例

下面的代码片段实现了一个具有 不可变树旋转 的位排序器。这不是您期望在 GPU 上快速运行的算法类型。然而，由于它使用分治法，这种方法本质上是并行的，因此 Bend 将在多个线程上执行它，无需创建线程，无需显式锁管理。

位排序器基准测试

bend run：CPU，Apple M3 Max：12.15秒
bend run-c：CPU，Apple M3 Max：0.96秒
bend run-cu：GPU，NVIDIA RTX 4090：0.21秒

点击此处查看位排序器代码

# Sorting Network = just rotate trees!
def sort(d, s, tree):
  switch d:
    case 0:
      return tree
    case _:
      (x,y) = tree
      lft   = sort(d-1, 0, x)
      rgt   = sort(d-1, 1, y)
      return rots(d, s, (lft, rgt))

# Rotates sub-trees (Blue/Green Box)
def rots(d, s, tree):
  switch d:
    case 0:
      return tree
    case _:
      (x,y) = tree
      return down(d, s, warp(d-1, s, x, y))

# Swaps distant values (Red Box)
def warp(d, s, a, b):
  switch d:
    case 0:
      return swap(s ^ (a > b), a, b)
    case _:
      (a.a, a.b) = a
      (b.a, b.b) = b
      (A.a, A.b) = warp(d-1, s, a.a, b.a)
      (B.a, B.b) = warp(d-1, s, a.b, b.b)
      return ((A.a,B.a),(A.b,B.b))

# Propagates downwards
def down(d,s,t):
  switch d:
    case 0:
      return t
    case _:
      (t.a, t.b) = t
      return (rots(d-1, s, t.a), rots(d-1, s, t.b))

# Swaps a single pair
def swap(s, a, b):
  switch s:
    case 0:
      return (a,b)
    case _:
      return (b,a)

# Testing
# -------

# Generates a big tree
def gen(d, x):
  switch d:
    case 0:
      return x
    case _:
      return (gen(d-1, x * 2 + 1), gen(d-1, x * 2))

# Sums a big tree
def sum(d, t):
  switch d:
    case 0:
      return t
    case _:
      (t.a, t.b) = t
      return sum(d-1, t.a) + sum(d-1, t.b)

# Sorts a big tree
def main:
  return sum(20, sort(20, 0, gen(20, 0)))

如果您对其他算法感兴趣，可以查看我们的示例文件夹

额外资源

要了解 Bend 背后的技术，请查看 HVM2 论文。
我们正在编写官方文档，同时，为了更深入的解释，请查看 GUIDE.md
在我们的 FEATURES.md 中了解我们的功能
Bend 由 HigherOrderCO 开发 - 加入我们的 Discord！

依赖关系

~3–12MB
~99K SLoC