【发布时间】:2018-05-18 02:10:07
【问题描述】:
为了熟悉强化学习,我正在实现基本的 RL 算法来玩游戏 Flappy Bird。我已经设置好了所有东西,我唯一遇到的问题是实现奖励功能。我希望能够处理屏幕并识别是否得分或鸟是否已经死亡。
处理屏幕是使用mss 和opencv 完成的,它返回一个stacked numpy array。然后奖励函数需要为提供的数组分配奖励,但我不知道如何去做。
这是单个处理后的图像的样子:
我实现奖励功能的想法是,如果背景停止移动,鸟就死了。如果这只鸟在两个管道之间的间隙中,那么代理就得了一分。关于如何在 numpy 计算中表达这一点的任何想法?
def _calculate_reward(self, state):
""""
calculate the reward of the state. Flappy is dead when the screen has stopped moving, so when two consecutive frames
are equal. A point is scored when an obstacle is above flappy, and before it wasn't. An object is above Flappy when
there are two white pixels in the first 50 pixels on the first row.
:param state: np.array shape = (1, height, width, 4) - > four consecutive processed frames
:return reward: int representing the reward if a point is scored or if flappy has died.
"""
if np.sum((state[0,:,:,3] - state[0,:,:,2])) == 0 and np.sum((state[0,:,:,2] - state[0,:,:,1])) == 0:
print("flappy is dead")
return -1000
elif sum(state[0,0,:50,3]) == 510 and sum(state[0,0,:50,2]) == 510 and sum(state[0,0,:50,1]) != 510 and sum(state[0,0,:50,0]) != 510:
print("point!")
return 1000
else:
return 0
【问题讨论】:
标签: python numpy image-processing reinforcement-learning