3 个不稳定版本

0.2.1	2023年4月2日
0.2.0	2023年2月21日
0.1.0	2023年1月16日

#651 in 机器学习

MIT/Apache

33KB
416 行

learnwell

简单的强化学习框架，允许您快速创建环境和测试它们。
旨在简单易用
最小化外部依赖，框架以创建您的实现

实现示例

Q-Learning
深度Q学习（DQN）

此项目处于alpha阶段。使用时请自行承担风险。

入门指南

查看出租车示例并逐行阅读注释 cargo run --release --example taxi 您还可以运行以下示例

hike - 带显示运行
taxi
mouse
mouseimage - DQN
taxiimage - DQN，带显示运行

导入

use learnwell::{
    runner::Runner, 
    agent::qlearning::QLearning, 
    environment::{Environment, EnvironmentDisplay}
    strategy::decliningrandom::DecliningRandom, 
    };

然后我们请求 Runner 运行代理 x 个时代

允许2种模式

Runner::run 用于常规操作
Runner::run_with_display 创建窗口并显示图像，该图像在运行时更新

例如

    Runner::run(
        QLearning::new(0.1, 0.98, DecliningRandom::new(epochs, 0.01)), //Agent
        TaxiEnvironment::default(), //Environment
        400, //epochs
    );

或

Runner::run_with_display(
        QLearning::new(0.2, 0.99,DecliningRandom::new(epochs, 0.005) ), //Agent
        Hike::new(), //Environment
        700_000, //epochs
        10 //frames per second to refresh image
    );

我们需要

环境 - 这是我们想要学习的游戏/场景
代理 - 这是与环境交互的东西

我们实现了一些东西来运行

环境

State 结构体 - 这是我们的行动依据
Action (通常是枚举) - 这些是我们执行的操作
实现 Environment<S,A> 特性的环境结构体依赖于 State 和 Action。环境结构体应持有状态，因为我们稍后会引用它

代理

代理算法（例如QLearning），

实现

注意，我们为 State 和 Action 都导出了 Hash、Eq、PartialEq 和 Clone

状态

#[derive(Hash, Eq, PartialEq, Clone)]
pub struct TaxiState {
    taxi: Point,
    dropoff: Point,
    passenger: Point, 
    in_taxi: bool,
}

操作

#[derive(Clone, Hash, PartialEq, Eq)]
pub enum TaxiAction {
    Up,
    Down,
    Left,
    Right,
    Dropoff,
    Pickup,
}

环境

pub struct TaxiEnvironment {
    state: TaxiState, //this is the actual state that gets saved in the qtable
    found: usize, //just a helper. there could be a few other items you want to track in the environment
}

状态

实现Q学习
实现深度Q学习
将可选功能移动到特性（例如：显示、fxhasher）

依赖关系

~16–28MB
~479K SLoC