1 Star 0 Fork 0

laiyijun2023/MazeCodeRepo

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
文件
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
agent.py 1.56 KB
一键复制 编辑 原始数据 按行查看 历史
Yijun Lai 提交于 2024-05-14 19:40 . add all files
import numpy as np
import mindspore.numpy as mnp
from mindspore import Tensor
class Agent:
def __init__(self, maze, memory_buffer, use_softmax=True):
"""智能体初始化"""
self.env = maze
self.buffer = memory_buffer
self.num_act = 4
self.use_softmax = use_softmax
self.total_reward = 0
self.min_reward = -self.env.maze.size
self.isgameon = True
def make_a_move(self, net, epsilon, device='cpu'):
"""执行动作"""
action = self.select_action(net, epsilon, device)
current_state = self.env.state()
next_state, reward, self.isgameon = self.env.state_update(action)
self.total_reward += reward
if self.total_reward < self.min_reward:
self.isgameon = False
if not self.isgameon:
self.total_reward = 0
transition = Transition(current_state, action, next_state, reward, self.isgameon)
self.buffer.push(transition)
def select_action(self, net, epsilon, device='cpu'):
"""选择动作"""
state = Tensor(self.env.state(), mindspore.float32).reshape(1, -1).to(device)
qvalues = net(state).asnumpy().squeeze()
if self.use_softmax:
p = mnp.softmax(qvalues / epsilon).squeeze()
p /= np.sum(p)
action = np.random.choice(self.num_act, p=p)
else:
if np.random.random() < epsilon:
action = np.random.randint(self.num_act, size=1)[0]
else:
action = int(np.argmax(qvalues, axis=0))
return action
Loading...
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/laiyijun2023/maze-code-repo.git
git@gitee.com:laiyijun2023/maze-code-repo.git
laiyijun2023
maze-code-repo
MazeCodeRepo
master

搜索帮助