To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning

https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

2021-06-18
2021-08-18
2021-12-17
2021-11-24
2022-02-22
2021-08-27
2021-08-21
2022-12-23