52 个版本

0.18.22 2024 年 8 月 17 日
0.18.21 2024 年 5 月 23 日
0.18.10 2024 年 4 月 12 日
0.18.9 2024 年 3 月 15 日
0.10.3 2022 年 11 月 28 日

#20 in 硬件支持

Download history 16013/week @ 2024-04-27 13884/week @ 2024-05-04 17609/week @ 2024-05-11 16584/week @ 2024-05-18 18566/week @ 2024-05-25 18204/week @ 2024-06-01 18384/week @ 2024-06-08 16935/week @ 2024-06-15 17378/week @ 2024-06-22 18086/week @ 2024-06-29 16044/week @ 2024-07-06 15485/week @ 2024-07-13 15820/week @ 2024-07-20 17585/week @ 2024-07-27 17059/week @ 2024-08-03 12178/week @ 2024-08-10

65,141 每月下载量
用于 137 个 Crates (22 直接)

MIT 许可证

1.5MB
31K SLoC

pulp 是 SIMD 指令的安全抽象,它允许你编写一次函数,并根据运行时检测到的功能调度等效的向量化版本。

Documentation Crate

自动向量化示例

use pulp::Arch;

let mut v = (0..1000).map(|i| i as f64).collect::<Vec<_>>();
let arch = Arch::new();

arch.dispatch(|| {
    for x in &mut v {
        *x *= 2.0;
    }
});

for (i, x) in v.into_iter().enumerate() {
    assert_eq!(x, 2.0 * i as f64);
}

手动向量化示例

use pulp::{Arch, Simd, WithSimd};

struct TimesThree<'a>(&'a mut [f64]);
impl<'a> WithSimd for TimesThree<'a> {
    type Output = ();

    #[inline(always)]
    fn with_simd<S: Simd>(self, simd: S) -> Self::Output {
        let v = self.0;
        let (head, tail) = S::f64s_as_mut_simd(v);

        let three = simd.f64s_splat(3.0);
        for x in head {
            *x = simd.f64s_mul(three, *x);
        }

        for x in tail {
            *x = *x * 3.0;
        }
    }
}

let mut v = (0..1000).map(|i| i as f64).collect::<Vec<_>>();
let arch = Arch::new();

arch.dispatch(TimesThree(&mut v));

for (i, x) in v.into_iter().enumerate() {
    assert_eq!(x, 3.0 * i as f64);
}

使用 pulp::with_simd 减少样板代码

仅通过 功能可用。

需要第一个非生命周期泛型参数,以及函数的第一个输入参数必须是 SIMD 类型。

#[pulp::with_simd(sum = pulp::Arch::new())]
#[inline(always)]
fn sum_with_simd<'a, S: Simd>(simd: S, v: &'a mut [f64]) {
    let (head, tail) = S::f64s_as_mut_simd(v);
    let three = simd.f64s_splat(3.0);
    for x in head {
        *x = simd.f64s_mul(three, *x);
    }
    for x in tail {
        *x = *x * 3.0;
    }
}

let mut v = (0..1000).map(|i| i as f64).collect::<Vec<_>>();
sum(&mut v);

for (i, x) in v.into_iter().enumerate() {
    assert_eq!(x, 3.0 * i as f64);
}

依赖项

~1MB
~18K SLoC