Reinforcement learning resources curated
A curated list of resources dedicated to reinforcement learning.
We are looking for more contributors and maintainers!
Please feel free to pull requests
Foundational Papers - Marvin Minsky, Steps toward Artificial Intelligence, Proceedings of the IRE, 1961. [DOI] [Paper] (discusses issues in RL such as the "credit assignment problem") - Ian H. Witten, An Adaptive Optimal Controller for Discrete-Time Markov Environments, Information and Control, 1977. [DOI] [Paper] (earliest publication on temporal-difference (TD) learning rule)
Methods - Dynamic Programming (DP): - Christopher J. C. H. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, 1989. [Thesis] - Monte Carlo: - Andrew Barto, Michael Duff, Monte Carlo Inversion and Reinforcement Learning, NIPS, 1994. [Paper] - Satinder P. Singh, Richard S. Sutton, Reinforcement Learning with Replacing Eligibility Traces, Machine Learning, 1996. [Paper] - Temporal-Difference: - Richard S. Sutton, Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44, 1988. [Paper] - Q-Learning (Off-policy TD algorithm): - Chris Watkins, Learning from Delayed Rewards, Cambridge, 1989. [Thesis] - Sarsa (On-policy TD algorithm): - G.A. Rummery, M. Niranjan, On-line Q-learning using connectionist systems, Technical Report, Cambridge Univ., 1994. [Report] - Richard S. Sutton, Generalization in Reinforcement Learning: Successful examples using sparse coding, NIPS, 1996. [Paper] - R-Learning (learning of relative values) - Andrew Schwartz, A Reinforcement Learning Method for Maximizing Undiscounted Rewards, ICML, 1993. [Paper-Google Scholar] - Function Approximation methods (Least-Square Temporal Difference, Least-Square Policy Iteration) - Steven J. Bradtke, Andrew G. Barto, Linear Least-Squares Algorithms for Temporal Difference Learning, Machine Learning, 1996. [Paper] - Michail G. Lagoudakis, Ronald Parr, Model-Free Least Squares Policy Iteration, NIPS, 2001. [Paper] [Code] - Policy Search / Policy Gradient - Richard Sutton, David McAllester, Satinder Singh, Yishay Mansour, Policy Gradient Methods for Reinforcement Learning with Function Approximation, NIPS, 1999. [Paper] - Jan Peters, Sethu Vijayakumar, Stefan Schaal, Natural Actor-Critic, ECML, 2005. [Paper] - Jens Kober, Jan Peters, Policy Search for Motor Primitives in Robotics, NIPS, 2009. [Paper] - Jan Peters, Katharina Mulling, Yasemin Altun, Relative Entropy Policy Search, AAAI, 2010. [Paper] - Freek Stulp, Olivier Sigaud, Path Integral Policy Improvement with Covariance Matrix Adaptation, ICML, 2012. [Paper] - Nate Kohl, Peter Stone, Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion, ICRA, 2004. [Paper] - Marc Deisenroth, Carl Rasmussen, PILCO: A Model-Based and Data-Efficient Approach to Policy Search, ICML, 2011. [Paper] - Scott Kuindersma, Roderic Grupen, Andrew Barto, Learning Dynamic Arm Motions for Postural Recovery, Humanoids, 2011. [Paper] - Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, Jean-Baptiste Mouret, Black-Box Data-efficient Policy Search for Robotics, IROS, 2017. [Paper] - Hierarchical RL - Richard Sutton, Doina Precup, Satinder Singh, Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning, Artificial Intelligence, 1999. [Paper] - George Konidaris, Andrew Barto, Building Portable Options: Skill Transfer in Reinforcement Learning, IJCAI, 2007. [Paper] - Deep Learning + Reinforcement Learning (A sample of recent works on DL+RL) - V. Mnih, et. al., Human-level Control through Deep Reinforcement Learning, Nature, 2015. [Paper] - Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, Xiaoshi Wang, Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, NIPS, 2014. [Paper] - Sergey Levine, Chelsea Finn, Trevor Darrel, Pieter Abbeel, End-to-End Training of Deep Visuomotor Policies. ArXiv, 16 Oct 2015. [ArXiv] - Tom Schaul, John Quan, Ioannis Antonoglou, David Silver, Prioritized Experience Replay, ArXiv, 18 Nov 2015. [ArXiv] - Hado van Hasselt, Arthur Guez, David Silver, Deep Reinforcement Learning with Double Q-Learning, ArXiv, 22 Sep 2015. [ArXiv] - Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, ArXiv, 4 Feb 2016. [ArXiv]
Traditional Games - Backgammon - Gerald Tesauro, "TD-Gammon" game play using TD(λ) (ACM 1995) [Paper] - Chess - Jonathan Baxter, Andrew Tridgell and Lex Weaver, "KnightCap" program using TD(λ) (1999) [arXiv] - Chess - Matthew Lai, Giraffe: Using deep reinforcement learning to play chess (2015) [arXiv]
Computer Games - Atari 2600 Games - Volodymyr Mnih, Koray Kavukcuoglu, David Silver et al., Human-level Control through Deep Reinforcement Learning (Nature 2015) [DOI] [Paper] [Code] [Video] - Flappy Bird - Sarvagya Vaish, Flappy Bird Reinforcement Learning [Video] - Mario - Kenneth O. Stanley and Risto Miikkulainen, MarI/O - learning to play Mario with evolutionary reinforcement learning using artificial neural networks (Evolutionary Computation 2002) [Paper] [Video] - StarCraft II - Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning (Nature 2019) [DOI] [Paper] [Video]
## Tutorials / Websites - Mance Harmon and Stephanie Harmon, Reinforcement Learning: A Tutorial - C. Igel, M.A. Riedmiller, et al., Reinforcement Learning in a Nutshell, ESANN, 2007. [Paper] - UNSW - Reinforcement Learning - Introduction - TD-Learning - Q-Learning and SARSA - Applet for "Cat and Mouse" Game - ROS Reinforcement Learning Tutorial - POMDP for Dummies - Scholarpedia articles on: - Reinforcement Learning - Temporal Difference Learning - Repository with useful MATLAB Software, presentations, and demo videos - Bibliography on Reinforcement Learning - UC Berkeley - CS 294: Deep Reinforcement Learning, Fall 2015 (John Schulman, Pieter Abbeel) [Class Website] - Blog posts on Reinforcement Learning, Parts 1-4 by Travis DeWolf - The Arcade Learning Environment - Atari 2600 games environment for developing AI agents - Deep Reinforcement Learning: Pong from Pixels by Andrej Karpathy - Demystifying Deep Reinforcement Learning - Let’s make a DQN - Simple Reinforcement Learning with Tensorflow, Parts 0-8 by Arthur Juliani - Practical_RL - github-based course in reinforcement learning in the wild (lectures, coding labs, projects) - RLenv.directory: Explore and find new reinforcement learning environments. - Katja Hofmann's talk at NeurIPS '19 - RL: Past, Present and Future Perspectives