11 个不稳定版本 (5 个破坏性更新)

0.6.0	2024年6月25日
0.5.1	2024年1月23日
0.5.0	2023年11月30日
0.4.0	2023年1月15日
0.1.2	2017年3月9日

#169 在算法

每月52次 下载

MPL-2.0 许可证

42KB
720 代码行

Rurel

Rurel 是 Rust 中的一个灵活、可重用的强化学习（Q 学习）实现。

发布文档

在 Cargo.toml 中

rurel = "0.6.0"

包含一个示例。这个示例教会智能体在一个 21x21 的网格中如何通过（向左、向上、向右、向下）动作到达 10,10。

cargo run --example eucdist

入门指南

你需要实现两个主要特性： rurel::mdp::State 和 rurel::mdp::Agent。

State 是一个定义了可以从该状态采取的动作的 Vec，并具有一定奖励的对象。一个 State 需要定义相应的动作类型 A。

Agent 是一个具有当前状态的对象，给定一个动作，可以采取该动作并评估下一个状态。

示例

让我们实现一个示例，运行 cargo run --example eucdist。我们想要训练一个智能体，使其学会在一个 21x21 的网格中到达 10,10。

首先，让我们定义一个 State，它应该代表 21x21 上的一个位置，以及相应的动作，即向上、向下、向左或向右。

use rurel::mdp::State;

#[derive(PartialEq, Eq, Hash, Clone)]
struct MyState { x: i32, y: i32 }
#[derive(PartialEq, Eq, Hash, Clone)]
struct MyAction { dx: i32, dy: i32 }

impl State for MyState {
	type A = MyAction;
	fn reward(&self) -> f64 {
		// Negative Euclidean distance
		-((((10 - self.x).pow(2) + (10 - self.y).pow(2)) as f64).sqrt())
	}
	fn actions(&self) -> Vec<MyAction> {
		vec![MyAction { dx: 0, dy: -1 },	// up
			 MyAction { dx: 0, dy: 1 },	// down
			 MyAction { dx: -1, dy: 0 },	// left
			 MyAction { dx: 1, dy: 0 },	// right
		]
	}
}

然后定义智能体

use rurel::mdp::Agent;

struct MyAgent { state: MyState }
impl Agent<MyState> for MyAgent {
	fn current_state(&self) -> &MyState {
		&self.state
	}
	fn take_action(&mut self, action: &MyAction) -> () {
		match action {
			&MyAction { dx, dy } => {
				self.state = MyState {
					x: (((self.state.x + dx) % 21) + 21) % 21, // (x+dx) mod 21
					y: (((self.state.y + dy) % 21) + 21) % 21, // (y+dy) mod 21
				}
			}
		}
	}
}

就是这样。现在创建一个训练器，并使用 Q 学习训练智能体，学习率为 0.2，折扣因子为 0.01，Q 初始值为 2.0。我们让训练器运行 100000 次迭代，随机探索新的状态。

use rurel::AgentTrainer;
use rurel::strategy::learn::QLearning;
use rurel::strategy::explore::RandomExploration;
use rurel::strategy::terminate::FixedIterations;

let mut trainer = AgentTrainer::new();
let mut agent = MyAgent { state: MyState { x: 0, y: 0 }};
trainer.train(&mut agent,
              &QLearning::new(0.2, 0.01, 2.),
              &mut FixedIterations::new(100000),
              &RandomExploration::new());

之后，你可以通过以下方式查询某个状态中某个动作的学习值（Q）：

trainer.expected_value(&state, &action) // : Option<f64>

开发

运行 cargo fmt --all 来格式化代码。
运行 cargo clippy --all-targets --all-features -- -Dwarnings 来检查代码。
运行 cargo test 来测试代码。

依赖项

~0.2–1MB
~20K SLoC