3个不稳定版本

0.2.1	2024年3月7日
0.2.0	2024年1月12日
0.1.0	2022年11月14日

#2 在 #k-means

每月 30 次下载

MIT 许可证

17KB
237 行代码（不包括注释）

聚类

此crate提供了一种简单高效的方法，在任意数据上执行k-means聚类。算法初始化时使用k-means++以获得最佳的聚类性能。

此k-means算法实现有三个目标

必须是通用的
必须易于使用
必须足够快

重要提示

根据您的执行环境和您要聚类的数据集的大小，您的代码可能从并行化中受益（这可以意味着对于大型问题有巨大的性能提升）。如果您想启用多线程行为，请在依赖项中添加“parallel”功能。

# To enable multithreading during clustering, add the "parallel" feature
# to your dependency.
[dependencies]
clustering = {version = "0.2.0", features = ["parallel"]}

# If all you aim for it a sequential clustering, just leave that feature out.
[dependencies]
clustering = {version = "0.2.0"}

示例

use clustering::*;

let n_samples    = 20_000; // # of samples in the example
let n_dimensions =    200; // # of dimensions in each sample
let k            =      4; // # of clusters in the result
let max_iter     =    100; // max number of iterations before the clustering forcefully stops

// Generate some random data
let mut samples: Vec<Vec<f64>> = vec![];
for _ in 0..n_samples {
    samples.push((0..n_dimensions).map(|_| rand::random()).collect::<Vec<_>>());
}

// actually perform the clustering
let clustering = kmeans(k, &samples, max_iter);

println!("membership: {:?}", clustering.membership);
println!("centroids : {:?}", clustering.centroids);

特性

此crate附带两个可选特性

parallel，它启用rayon的多线程调度（感谢@jean-pierreBoth的贡献）
logging，您可以使用它来记录聚类时采取的捷径。

依赖项

~0.2–1.2MB
~21K SLoC