![强化学习](https://wfqqreader-1252317822.image.myqcloud.com/cover/245/34233245/b_34233245.jpg)
上QQ阅读APP看书,第一时间看更新
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_566.jpg?sign=1738926299-MZphxjAxWipem76t9BNrEuzVJ5l5rkf6-0-47d3499978f9e1c83aa141eebd12d754)
图4-1 宝盒
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_569.jpg?sign=1738926299-wj0CSZSmQCReq26vWrnZy6hV4aTH7hNv-0-a23cf7141cc174a4f73fbfa66aa10179)
图4-15 两种方法计算出来的最优值函数对比图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P3_572.jpg?sign=1738926299-ISmnePwE9wIBOWQcxOOG14IWPrKqd2ZH-0-64ecafa0e2e973ce9fc2f34278728193)
图5-1 MC方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_575.jpg?sign=1738926299-iuBZVvOpaeAaVUQZChpYVZng8tnt8OT9-0-35a390612a1ddf6a94846dac474aaec3)
图5-2 DP方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_578.jpg?sign=1738926299-LjGUE8fN94Y7RNTmqtP1TJuR6tWxRLP7-0-68272ddb495bc1015ca9ca13b6a7c445)
图5-3 TD方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P4_581.jpg?sign=1738926299-UAMCW4hlc9EmWlDt0VWdPkpcmiNIbNnA-0-92ace76e08b285177ef2d03df58ea422)
图5-6 迷宫环境
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_584.jpg?sign=1738926299-E2MKaoC6hi9wTzEINcpjVYq1yErcccBn-0-5767ab536730a7aa1e1f95c5fb30ffac)
图5-7 Sarsa方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_587.jpg?sign=1738926299-qE1w8MNm4CbQv7gMuJtNPz4EPFsPd9Lv-0-c296e8d92f9da63ed411c47f9a2c68d4)
图6-12 风格子世界
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_591.jpg?sign=1738926299-tmWnHp9u4xFHIW2zRJjWWD9oimtauplJ-0-3466d073891b2cbed0f868224ccb220f)
图6-13 后向Sarsa(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_595.jpg?sign=1738926299-ClvTKBJ34H9vHVAN3u8UkYB8VgghayXt-0-2c50bde7a3b62c88233e29e332eb7917)
图6-14 后向Sarsa(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_598.jpg?sign=1738926299-Do0AgqZqjfl9uJtbuWXaxFbGY1dy79Ep-0-cae1535a86665a08414cd5a8002fa9e6)
图6-15 后向Q(λ)方法得到的最优策略
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P5_601.jpg?sign=1738926299-F3s3zMAYA9HI0aJGWYJptChZlX8ttK5X-0-921aa0a114e46af348b7ee2cb572acb5)
图6-16 后向Q(λ)方法得到的最优路径
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_604.jpg?sign=1738926299-hHNnnYv49NxvKoajTuxNLbu4evIo5vfh-0-31c5fcddd65c4315d6ddbe27fad4358a)
图7-3 DQN的神经网络结构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_607.jpg?sign=1738926299-26LJr17MBuGWyukkLwv80FAAQj7D0Dxm-0-e2e20898d1f167aaf8e922c1dc4aa54f)
图7-7 驾驶汽车
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P6_610.jpg?sign=1738926299-Lll5Gkc2wEmX30DpoIJWdHoxoqhdJhYn-0-6390260df4087e70b817bddced50e8c5)
图7-10 飞翔的小鸟
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_613.jpg?sign=1738926299-5aS54hgV8mM3eUncEn4UFjTNXk0ShWDZ-0-f96590e0c0f47948e998c943b2261085)
图7-11 删除游戏背景
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_616.jpg?sign=1738926299-wWMj7QORHvmH7JZuYDHZnJYws5TV1iAb-0-33c45600c13ff6c1deb9bf1dbd2bb8df)
图7-13 灰度化和二值化
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P7_619.jpg?sign=1738926299-rBbuEKsKF5BAZjNkZy0Ck9kYb0ZEnmSW-0-82a80f4ee5da3f8f4a4aaac57f606c7d)
图8-4 )及
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_622.jpg?sign=1738926299-bFrUdl0OfwfibdhhnvIzopQuIJ6pnuSd-0-1d2506bd1fc009770260cf3154b4b254)
图9-1 异步方法
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_625.jpg?sign=1738926299-JHXtq8R49NvzHEBcOnKqTP35BQ2exCPa-0-eebfd26afe0aedd9f6f26862f2233e27)
图13-12 策略网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P8_628.jpg?sign=1738926299-tta0gWFd7iYY8eyNGSvPaMR1eNlkGvcm-0-a6ab280b9d63387f9ff15619feeac3a1)
图13-13 价值网络结构示意图
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P9_631.jpg?sign=1738926299-cVyn32VmhS2o0lHAupHfCRA1NIj4agHX-0-5a05e329dba45f5fdfaf39cf07e37639)
图13-16 AlphaGo整体架构
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_646.jpg?sign=1738926299-9L6wADVKE2grXCZsBYDnFqOt1yGqrE9q-0-8d1dd2fa7953473e7d27ae2cd9c2909c)
图13-17 在线对弈过程
![](https://epubservercos.yuewen.com/C17FFE/18320967008553606/epubprivate/OEBPS/Images/Figure-P10_649.jpg?sign=1738926299-KIsgApURpbTvmVvi3ADcf7RvCH8CuSny-0-35351134136fc6b87e800eccf9addc70)
图13-18 AlphaGo Zero下棋原理