10个版本

0.5.2	2022年7月29日
0.5.1	2022年2月13日
0.5.0	2021年10月15日
0.4.1	2021年7月24日
0.1.0	2020年8月21日

#185 in 机器学习

每月69次下载
在 linfa-tsne 中使用

MIT 许可证

81KB
1.5K SLoC

bhtsne

用Rust编写的t-SNE算法的并行Barnes-Hut和精确实现。该算法的树加速版本在这篇论文中有详细描述，作者为Laurens van der Maaten。该算法的精确、原始版本在这篇论文中有描述，作者为G. Hinton 和 Laurens van der Maaten。还包括本在内的其他算法实现。

安装

将此行添加到您的 Cargo.toml

[dependencies]
bhtsne = "0.5.1"

文档

API文档在此处可用：此处.

示例

该实现支持自定义数据类型和自定义定义的度量。例如，一般向量数据可以按以下方式处理。

 use bhtsne;

 const N: usize = 150;         // Number of vectors to embed.
 const D: usize = 4;           // The dimensionality of the
                               // original space.
 const THETA: f32 = 0.5;       // Parameter used by the Barnes-Hut algorithm.
                               // Small values improve accuracy but increase complexity.

 const PERPLEXITY: f32 = 10.0; // Perplexity of the conditional distribution.
 const EPOCHS: usize = 2000;   // Number of fitting iterations.
 const NO_DIMS: u8 = 2;        // Dimensionality of the embedded space.
 
 // Loads the data from a csv file skipping the first row,
 // treating it as headers and skipping the 5th column,
 // treating it as a class label.
 // Do note that you can also switch to f64s for higher precision.
 let data: Vec<f32> = bhtsne::load_csv("iris.csv", true, Some(&[4]), |float| {
         float.parse().unwrap()
 })?;
 let samples: Vec<&[f32]> = data.chunks(D).collect();
 // Executes the Barnes-Hut approximation of the algorithm and writes the embedding to the
 // specified csv file.
 bhtsne::tSNE::new(&samples)
     .embedding_dim(NO_DIMS)
     .perplexity(PERPLEXITY)
     .epochs(EPOCHS)
     .barnes_hut(THETA, |sample_a, sample_b| {
             sample_a
             .iter()
             .zip(sample_b.iter())
             .map(|(a, b)| (a - b).powi(2))
             .sum::<f32>()
             .sqrt()
     })
     .write_csv("iris_embedding.csv")?;

在示例中使用欧几里得距离，但可以选择其他距离度量，如字符串，并定义并插入。

并行性

基于 rayon 构建，算法使用与可用CPU数量相同的线程数。请注意，在启用超线程的系统上，这等于逻辑核心数，而不是物理核心数。有关更多信息，请参阅rayon的常见问题解答。

MNIST嵌入

以下嵌入是通过使用PCA将MNIST训练集的维度降低到50来获得的。在2.0GHz四核第十代i5 MacBook Pro上，这大约花费了3分6秒。 mnist

依赖项

约4MB
约67K SLoC