https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node26.html

【平均-打折奖励】

Schwartz [106] examined the problem of adapting Q-learning to an average-reward framework. Although his R-learning algorithm seems to exhibit convergence problems for some MDPs, several researchers have found the average-reward criterion closer to the true problem they wish to solve than a discounted criterion and therefore prefer R-learning to Q-learning [69].

相关文章:

  • 2021-06-18
  • 2021-08-18
  • 2021-12-17
  • 2021-11-24
  • 2022-02-22
  • 2021-08-27
  • 2021-08-21
  • 2022-12-23
猜你喜欢
  • 2021-07-08
  • 2021-05-18
  • 2021-06-17
  • 2022-03-01
  • 2021-04-12
  • 2022-12-23
  • 2021-05-30
相关资源
相似解决方案