12 个版本

0.3.4	2023 年 9 月 3 日
0.3.3	2023 年 9 月 3 日
0.3.2	2023 年 8 月 8 日
0.2.3	2023 年 3 月 13 日
0.1.4	2023 年 2 月 15 日

151 在机器学习中

MIT 许可证

105KB
1.5K SLoC

darjeeling

Rust 的机器学习工具

联系方式

elocolburn@comcast.net

安装

将以下依赖项添加到您的 Cargo.toml 文件中

darjeeling = "0.3.4"

基本设置

创建一个网络

    use darjeeling::{
        categorize,
        activation::ActivationFunction
    };
    let input_num = 2;
    let hidden_num = 2;
    let answer_num = 2;
    let hidden_layers = 1;
    let mut net = categorize::NeuralNetwork::new(input_num, hidden_num, answer_num, hidden_layers, ActivationFunction::Sigmoid);
    net.add_hidden_layer_with_size(2);

您还可以添加具有固定神经元数量的隐藏层，因为在初始化期间，所有隐藏层必须具有相同的大小。

将数据格式化为输入

    use darjeeling::{
        types::Types,
        input::Input
    };
    // Do this for every input
    let float_inputs: Vec<f32> = vec![];
    let answer_input: Types = Types::String("Bee");
    let input = Input::new(float_inputs, answer_input);

输入代表一组浮点数以及它们应映射到的答案节点。例如，如果输入是蜜蜂的图片，则 float_inputs 可能是每个像素的十六进制值，而答案输入可能是 "Bee"。请确保答案始终是有效的类别。

训练您的网络

    let learning_rate: f32 = 3.0;
    let categories: Vec<Types> = categories_str_format(vec!["Bee","3"]);
    let data: Vec<Input> = data_fmt();
    let target_err_percent = 95.0;
    match net.learn(&mut data, categories, learning_rate, "bees3s", target_err_percent) {
        // Do whatever you want with this data
        Ok((model_name, err_percent, mse)) => Some(()),
        Err(_err) => None
    }

如果训练成功，则返回模型名以及网络在其最后一个纪元上正确分类的训练输入的百分比和训练的平均平方误差。

测试您的网络

    // Do whatever you want with this data
    match categorize::NeuralNetwork::test(data, categories, model_name) {
        // Vec<Types>
        Ok(types) => {},
        // DarjeelingError
        Err(err) => {}
    };

在测试期间，应将输入数据中的答案设置为 None。测试返回一个向量，其中包含分配给输入数据的所有类别，其顺序与数据相同。

示例

分类

此程序从包含所有可能的二进制逻辑门输入和正确答案的文件中读取。

然后它训练一个包含 1 个隐藏层的模型。输入层有 2 个节点，因为有两个输入。输出层有 2 个节点，因为有两种可能的答案（选定的答案是 " brighter"）。隐藏层有 2 个节点，因为我喜欢模式。

如果不起作用，请检查 tests.ts 源代码以获取已验证的代码。

提示：如果事情不正常，尝试调整您正在使用的学习率。

不同的问题与不同的学习率工作效果不同，尽管我建议从 0.5 开始。

    use core::{panic};
    use darjeeling::{ categorize::NeuralNetwork, tests, input::Input, series::Series, dataframe::{DataFrame, Point}};
    use std::{io::{BufReader, BufRead}, fs};

    fn train_test_xor() {
        let learning_rate:f32 = 1.0;
        let categories = NeuralNetwork::categories_format(vec!["0","1"]);
        let data = xor_file();

        let model_name: String = train_network_xor(data.clone(), categories.clone(), learning_rate).unwrap();

        NeuralNetwork::test(data, categories, model_name);
    }
    
    fn train_network_xor(mut data:Vec<Input>, categories: Vec<String>, learning_rate: f32) -> Option<String> {
        let input_num: i32 = 2;
        let hidden_num: i32 = 2;
        let answer_num: i32 = 2;
        let hidden_layers: i32 = 1;
        let model_name: &str = "xor";
        let target_err_percent =  99.0;
        // Creates a new Neural Network
        let mut net = NeuralNetwork::new(input_num, hidden_num, answer_num, hidden_layers);
        // Trains the Neural Network
        match net.learn(&mut data, categories, learning_rate, model_name, target_err_percent) {
            // Mean Squared Error
            Ok((model_name, _err_percent, _mse)) => Some(model_name),
            Err(_err) => None
        }
    }

    // This isn't very important, this just reads the file you want to and format it as Inputs
    fn xor_file() -> Vec<Input> {
        let file = match fs::File::open("training_data/xor.txt") {
            Ok(file) => file,
            Err(error) => panic!("Panic opening the file: {:?}", error)
        };

        let reader = BufReader::new(file);
        let mut inputs: Vec<Input> = vec![];

        for l in reader.lines() {

            let line = match l {
                Ok(line) => line,
                Err(error) => panic!("{:?}", error)
            };

            let init_inputs: Vec<&str> = line.split(";").collect();
            let float_inputs: Vec<f32> = vec![init_inputs[0].split(" ").collect::<Vec<&str>>()[0].parse().unwrap(), init_inputs[0].split(" ").collect::<Vec<&str>>()[1].parse().unwrap()];

            let input: Input = Input { inputs: float_inputs, answer:init_inputs.get(init_inputs.len()-1).as_ref().unwrap().to_owned().to_string() };
            inputs.push(input);
        }

    inputs  
    }

生成

此程序没有足够大的数据集来获得有趣的结果。它所做的只是创建一个网络并

    use darjeeling::{
         generation::NeuralNetwork,
         activation::ActivationFunction,
         input::Input, 
         // This file may not be avaliable
         // Everything found here will be hyper-specific to your project.
         tests::{categories_str_format, file}
    };
     
    // A file with data
    // To make sure the networked is properly trained, make sure it follows some sort of pattern
    // This is just sample data, for accurate results, around 3800 datapoints are needed
    // 1 2 3 4 5 6 7 8
    // 3 2 5 4 7 6 1 8
    // 0 2 5 4 3 6 1 8
    // 7 2 3 4 9 6 1 8
    // You also need to write the file input function
    // Automatic file reading and formatting function coming soon
    let mut data: Vec<Input> = file();
    let mut net = NeuralNetwork::new(2, 2, 2, 1, ActivationFunction::Sigmoid);
    let learning_rate = 1.0;
    let model_name = net.learn(&mut data, categories, learning_rate, "gen").unwrap();
    let new_data: Vec<Input> = net.test(data).unwrap();

常见问题解答

为什么叫达尔吉林？

因为那是我编程的大部分时间所在的茶馆的 WiFi 密码。

贡献

如果您想贡献，请查看待办事项或问题，fork代码，准备好后随时发起pull request。我非常乐意审查任何人想要添加的代码。我很乐意为想要贡献的人提供任何帮助，包括但不限于教授机器学习、Rust以及Darjeeling的工作原理。我们欢迎所有经验水平的人。如果您需要任何东西，请发邮件给我。如果您无法或不想修复的bug，请打开一个issue。

指南

尊重所有人及其差异。
保持友好。
对所有经验和技能水平的人保持耐心。

待办事项

提高生成质量（真的，现在的结果不太好）
添加对Polars DataFrame的支持
使数据处理和输入格式化更容易
优化

Dataframes和series现在已弃用

依赖项

~2.1–3.5MB
~66K SLoC