[2]I. Osband, B. V. Roy, and Z. Wen, “Generalization and exploration via randomized value functions,” inInternational Conference on Machine Learning, 2016, pp. 2377–2386.
[3]I. Osband, C. Blundell, A. Pritzel, and B. V. Roy, “Deep exploration via bootstrapped DQN,” in Advances inNeural Information Processing Systems 29, 2016, pp. 4026–4034.
[4]K. Ciosek, Q. Vuong, R. Loftin, and K. Hofmann, “Better exploration with optimistic actor critic,” inAdvances in Neural Information Processing Systems, 2019, pp. 1785–1796.
[5]C. Bai, L. Wang, L. Han, J. Hao, A. Garg, P. Liu, and Z. Wang, “Principled exploration via optimisticbootstrapping and backward induction,” in International Conference on Machine Learning, 2021.
[6]J. Kirschner and A. Krause, “Information directed sampling and bandits with heteroscedastic noise,” inConference On Learning Theory, 2018, pp. 358–384.
[7]B. Mavrin, H. Yao, L. Kong, K. Wu, and Y. Yu, “Distributional reinforcement learning for efficientexploration,” in International Conference on Machine Learning, 2019, pp. 4424–4434.
[8]D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven exploration by self-supervisedprediction,” in International Conference on Machine Learning, 2017, pp. 2778–2787.
[9]H. Kim, J. Kim, Y. Jeong, S. Levine, and H. O. Song, “EMI: exploration with mutual information,” inInternational Conference on Machine Learning, 2019, pp. 3360–3369.
[10]Y. Burda, H. Edwards, A. J. Storkey, and O. Klimov, “Exploration by random network distillation,” inInternational Conference on Learning Representations, 2019.
[11]R. Y. Tao, V. Fran?ois-Lavet, and J. Pineau, “Novelty search in representational space for sample efficientexploration,” in Advances in Neural Information Processing Systems, 2020.
[12]Y. Du, L. Han, M. Fang, J. Liu, T. Dai, and D. Tao, “LIIR: learning individual intrinsic reward in multi-agentreinforcement learning,” in Advances in Neural Information Processing Systems, 2019, pp. 4405– 4416
[13]R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. D. Turck, and P. Abbeel, “VIME: variational information maximizing exploration,” in Advances in Neural Information Processing Systems, 2016, pp. 1109–1117.
[14]T. Wang, J. Wang, Y. Wu, and C. Zhang, “Influence-based multi-agent exploration,” in International Conference on Learning Representations, 2020
[15]D. Horgan, J. Quan, D. Budden, G. Barth-Maron, M. Hessel, H. van Hasselt, and D. Silver, “Distributed prioritized experience replay,” in International Conference on Learning Representations, 2018.
[16]S. Kapturowski, G. Ostrovski, J. Quan, R. Munos, and W. Dabney, “Recurrent experience replay in distributed reinforcement learning,” in International Conference on Learning Representations, 2019.
[17]M. Fortunato, M. G. Azar, B. Piot, J. Menick, M. Hessel, I. Osband, A. Graves, V. Mnih, R. Munos, D. Hassabis, O. Pietquin, C. Blundell, and S. Legg, “Noisy networks for exploration,” in International Conference on Learning Representations, 2018.
[18]E. Adrien, H. Joost, L. Joel, S. K. O, and C. Jeff, “First return, then explore,” Nature, vol. 590, no. 7847, pp.580–586, 2021.
【深度强化学习探索算法最新综述,近200篇文献揭示挑战和未来方向】[19]A. Mahajan, T. Rashid, M. Samvelyan, and S. Whiteson, “MAVEN: multi-agent variational exploration,” inAdvances in Neural Information Processing Systems, 2019, pp. 7611–7622.
特别声明:本站内容均来自网友提供或互联网,仅供参考,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
