1.Delayed, sparse reward(feedback), Long-term planning

Hierarchical Deep Reinforcement Learning, Sub-goal, SAMDP, optoins, Thompson sampling, Boltzman exploration, Improving Exploration

 

2.Partial observability, Imperfect-Information

Memory, Nash equilibria, MCTS, self-play, LSTM, active perception, curiosity

 

3.Large state space, Large action space

Hardware, Distributon, Deeper Neural Network.

相关文章:

  • 2021-07-28
  • 2021-05-29
  • 2021-12-02
  • 2022-01-17
  • 2021-04-04
  • 2021-12-24
  • 2021-07-26
  • 2022-02-16
猜你喜欢
  • 2021-09-07
  • 2021-12-13
  • 2021-12-13
  • 2021-06-06
  • 2021-07-25
  • 2022-01-01
  • 2021-12-08
相关资源
相似解决方案