cs294-RL introduction

强化学习的种类

cs294-RL introduction

model-based RL

cs294-RL introduction

值函数

cs294-RL introduction

policy gradient

cs294-RL introduction

actor-critic： value function plus policy gradients cs294-RL introduction

为什么要有那么多的RL算法？

协调因素：采样高效、稳定
不同假设：随机或确定、连续or离散、episode or infinite horizon
难度不同：策略展示简单还是模型展示简单

cs294-RL introduction

采样高效、on-policy or off-policy

cs294-RL introduction

算法的采样比较：

cs294-RL introduction

具体算法：

cs294-RL introduction

相关文章：

2022-02-28
2022-01-08
2021-06-06
2022-01-01
2021-12-08
2021-07-28
2021-05-29
2021-11-20

猜你喜欢

2021-10-19
2021-04-06
2021-10-03
2021-10-07
2021-05-31
2021-07-31
2022-12-23

相关资源

下载 2023-04-06
下载 2021-06-24

相似解决方案

热门标签

Java Python linux javascript Mysql C# Docker 算法前端 SpringBoot Redis Vue spring 设计模式 .net core .net kubernetes c++ 数据库数据结构大数据 js 机器学习微服务 Android Go 程序员面试 JVM ASP.net core 云原生人工智能后端 PHP git CSS golang k8s Nginx Django mybatis 深度学习多线程 React 架构 devops 爬虫云计算 Spring Boot LeetCode