Bridging the Gap Between AI Planning and Reinforcement Learning (PRL)

While AI Planning and Reinforcement Learning communities focus on similar sequential decision-making problems, these communities remain somewhat unaware of each other on specific problems, techniques, methodologies, and evaluation.

This workshop aims to encourage discussion and collaboration between the researchers in the fields of AI planning and reinforcement learning. We aim to bridge the gap between the two communities, facilitate the discussion of differences and similarities in existing techniques, and encourage collaboration across the fields. We solicit interest from AI researchers that work in the intersection of planning and reinforcement learning, in particular, those that focus on intelligent decision making. As such, the joint workshop program is an excellent opportunity to gather a large and diverse group of interested researchers.

Proceedings are now available here.


The workshop will take place online, on October 22nd and 23rd. In order to participate in the workshop, you need to register for ICAPS. The registration is free, and once registered, you will get a password to Gather platform and a direct link to the virtual conference. Please remember to select the PRL workshop when registering.

An additional discussion will take place on a dedicated Slack Workspace. Please join the workspace in advance and join the dedicated channels for relevant papers.

Invited Speakers

Playlist of all the invited talks.

  • Will Dabney, DeepMind (recorded)

    Title: Advances in Distributional Reinforcement Learning And Connections With Planning

    Distributional Reinforcement Learning has seen continued research progress in recent years, largely focused on either theoretical analysis or developing new, empirically validated, methods for approximating the distribution of returns. In this talk, we will discuss these recent advances, attempt to understand the reasons for their empirical success, and finally investigate connections with planning where distributional reinforcement learning may suggest promising future work.

    Will Dabney is a research scientist at DeepMind. His research focuses on reinforcement learning, with collaborations into other areas of machine learning and neuroscience. Recent work has been focused on distributional reinforcement learning and representation learning, but core problems such as exploration and temporal abstraction continue to beckon.

  • Alan Fern, Oregon State University (recorded)

    Title: Deep Flat MDPs for Offline Model-Based Reinforcement Learning

    While there has been a growing interest in model-based RL, it is rare to see optimal, or near-optimal, planning incorporated into actual implementations. This is especially true in the offline RL setting, where a model must be estimated from a static set of experience data. In this talk, we will describe our recent efforts to push the integration of deep representation learning with near-optimal planners for offline RL. In particular, we will introduce the Deep Averagers with Costs MDP (DAC-MDP) as a principled way to leverage optimal planners for flat/tabular MDP representations derived from continuous deep representations. Using a planner based on a GPU-implementation of value iteration, we demonstrate scalability to complex image-based environments such as Atari with relatively simple representations derived from offline model-free learners. We also illustrate potential use-cases of our planning-based approach for zero-shot adaptation to changes in the environment and optimization objectives. We will end with a discussion of future directions suggested by the work for further integration of symbolic planners and RL.

    Alan Fern is Professor of Computer Science and Associate Head of Research for the School of EECS at Oregon State University. He received his Ph.D. (2004) and M.S. (2000) in computer engineering from Purdue University, and his B.S. (1997) in electrical engineering from the University of Maine. He is an associate editor of the Machine Learning Journal, the Journal of Artificial Intelligence Research, and serves on the executive council of the International Conference on Automated Planning and Scheduling. His research interests span a range of topics in artificial intelligence, including machine learning and automated planning/control, with a particular interest in the intersection of those areas.

  • Michael Littman, Brown University (recorded)

    Title: Logical Planning in Murky Perceptual Domains: From Soup to Nots

    I’ll talk about the challenge of using planning successfully in real-world perceptual domains and what I think are the most promising ideas for learning plannable representations.

    Michael Littman studies machine learning and decision making under uncertainty.  He has earned multiple awards for teaching and his research has been recognized with three best-paper awards and two influential paper awards for his work on reinforcement learning, probabilistic planning, and automated crossword-puzzle solving. Littman has served on the editorial boards for the Journal of Machine Learning Research and the Journal of Artificial Intelligence Research. He was general chair of International Conference on Machine Learning 2013 and program chair of the Association for the Advancement of Artificial Intelligence (AAAI) Conference 2013. He is co-director of Brown’s Humanity Centered Robotics Initiative and a Fellow of the Association for the Advancement of Artificial Intelligence and the Association for Computing Machinery. 

  • Julian Schrittwieser, DeepMind (recorded)

    Title: MuZero – Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

    Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. The MuZero algorithm, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games – the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled – our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

    Julian Schrittwieser is a senior researcher and tech lead at DeepMind, where he has been working on Reinforcement Learning for the last six years. Previously, he was an engineer at Google and studied at the technical university of Vienna. His past work includes AlphaGo, AlphaGo Zero, AlphaZero and most recently MuZero. He is excited about the application of RL and other machine learning techniques to solve real world problems. In his spare time, he likes to run, learn languages and travel; he sometimes blogs at

  • Peter Stone, The University of Texas at Austin (recorded)

    Title: Task-Motion Navigation Planning with Learning for Adaptable Mobile Service Robots

    Task-motion planning (TMP) addresses the problem of efficiently generating executable and low-cost task plans in a discrete space such that the (initially unknown) action costs are determined by motion plans in a corresponding continuous space. A task-motion plan for a mobile service robot that behaves in a highly dynamic domain can be sensitive to domain uncertainty and changes, leading to suboptimal behaviors or execution failures. This talk examines the ways in which machine learning can be integrated into a TMP system to increase generality and robustsness, with particular focus on navigation planning problems.

    Dr. Peter Stone is the David Bruton, Jr. Centennial Professor and Associate Chair of Computer Science, as well as Chair of the Robotics Consortium, at the University of Texas at Austin. In 2013 he was awarded the University of Texas System Regents’ Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers, earning him the title of University Distinguished Teaching Professor. Professor Stone’s research interests in Artificial Intelligence include machine learning (especially reinforcement learning), multiagent systems, and robotics. Professor Stone received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs – Research. He is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, IEEE Fellow, AAAS Fellow, Fulbright Scholar, and 2004 ONR Young Investigator. In 2007 he received the prestigious IJCAI Computers and Thought Award, given biannually to the top AI researcher under the age of 35, and in 2016 he was awarded the ACM/SIGAI Autonomous Agents Research Award. Professor Stone co-founded Cogitai, Inc., a startup company focused on continual learning, in 2015, and currently serves as Executive Director of Sony AI America.


October 22nd

18:005:0020:0014:0011:00Invited Talk: Julian Schrittwieser
18:405:4020:4014:4011:40#21 Frederik Drachmann, Andrea Dittadi and Thomas Bolander, Planning from Pixels in Atari with Learned Symbolic Representations
18:505:5020:5014:5011:50#18 Brendan Juba, Hai Le and Roni Stern, Safe Learning of Lifted Action Models
19:156:1521:1515:1512:15Invited Talk: Alan Fern
19:556:5521:5515:5512:55Discussion Session (Chair: Alan Fern): Learning Symbolic Models for Planning
21:308:3023:3017:3014:30Invited Talk: Peter Stone
22:109:100:1018:1015:10#23 Maximilian Fickert, Tianyi Gu, Leonhard Staut, Sai Lekyang, Wheeler Ruml, Joerg Hoffmann and Marek Petrik, Real-time Planning as Data-driven Decision-making
22:209:200:2018:2015:20#12 Thomas Moerland, Anna Deichler, Simone Baldi, Joost Broekens and Catholijn Jonker, Think Neither Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning
22:309:300:3018:3015:30#8 Tomas Brazdil, Krishnendu Chatterjee, Petr Novotný and Jiří Vahala, Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes (Extended Abstract)
22:409:400:4018:4015:40Discussion Session (Chair: Alan Fern and Hector Palacios): Interplay between RL and Search

October 23rd

18:005:0020:0014:0011:00Invited Talk: Michael Littman
18:405:4020:4014:4011:40#17 Sankalp Garg, Aniket Bajpai and Mausam Mausam, Symbolic Network: Generalized Neural Policies for Relational MDPs
18:505:5020:5014:5011:50#5 Or Rivlin, Tamir Hazan and Erez Karpas, Generalized Planning With Deep Reinforcement Learning
19:006:0021:0015:0012:00#1 Tom Silver and Rohan Chitnis, PDDLGym: Gym Environments from PDDL Problems
19:106:1021:1015:1012:10Discussion Session (Chairs: Vicenç Gómez and Scott Sanner): Challenge Problems for RL and Planning
19:456:4521:4515:4512:45Invited Talk: Will Dabney
20:257:2522:2516:2513:25#13 David Speck, André Biedenkapp, Frank Hutter, Robert Mattmüller and Marius Lindauer, Learning Heuristic Selection with Dynamic Algorithm Configuration
20:357:3522:3516:3513:35#9 Kevin Osanlou, Jeremy Frank, J. Benton, Andrei Bursuc, Christophe Guettier, Eric Jacopin and Tristan Cazenave, Time-based Dynamic Controllability of Disjunctive Temporal Networks with Uncertainty: A Tree Search Approach with Graph Neural Network Guidance
20:457:4522:4516:4513:45#10 Andrea Micheli and Alessandro Valentini, Synthesis of Search Heuristics for Temporal Planning via Reinforcement Learning
20:557:5522:5516:5513:55Discussion Session (Chairs: Hector Palacios and Michael Katz): Learning Planning Heuristics

Accepted papers

[EasyChair id] Paper title (authors). EasyChair id is used for identifying papers in the poster session

[1] PDDLGym: Gym Environments from PDDL Problems (Tom Silver and Rohan Chitnis) (pdf) (poster)
[3] Model-free Automated Planning Using Neural Networks (Michaela Urbanovská, Jan Bím, Leah Chrestien, Antonín Komenda and Tomáš Pevný) (pdf) (poster)
[5] Generalized Planning With Deep Reinforcement Learning (Or Rivlin, Tamir Hazan and Erez Karpas) (pdf) (poster)
[8] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes (Extended Abstract) (Tomas Brazdil, Krishnendu Chatterjee, Petr Novotný and Jiří Vahala) (pdf) (poster)
[9] Time-based Dynamic Controllability of Disjunctive Temporal Networks with Uncertainty: A Tree Search Approach with Graph Neural Network Guidance (Kevin Osanlou, Jeremy Frank, J. Benton, Andrei Bursuc, Christophe Guettier, Eric Jacopin and Tristan Cazenave) (pdf) (poster)
[10] Synthesis of Search Heuristics for Temporal Planning via Reinforcement Learning (Andrea Micheli and Alessandro Valentini) (pdf) (poster)
[11] A Framework for Reinforcement Learning and Planning: Extended Abstract (Thomas Moerland, Joost Broekens and Catholijn Jonker) (pdf) (poster)
[12] Think Neither Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning (Thomas Moerland, Anna Deichler, Simone Baldi, Joost Broekens and Catholijn Jonker) (pdf) (poster)
[13] Learning Heuristic Selection with Dynamic Algorithm Configuration (David Speck, André Biedenkapp, Frank Hutter, Robert Mattmüller and Marius Lindauer) (pdf) (poster)
[14] Knowing When To Look Back: Bidirectional Rollouts in Dyna-style Planning (Yat Long Lo, Jia Pan and Albert Y.S. Lam) (pdf) (poster)
[15] PBCS: Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning (Guillaume Matheron, Olivier Sigaud and Nicolas Perrin) (pdf) (poster)
[16] Hierarchical Reinforcement Learning in StarCraft II with Human Expertise in Subgoals Selection (Xinyi Xu, Tiancheng Huang, Pengfei Wei, Akshay Narayan and Tze-Yun Leong) (pdf) (poster)
[17] Symbolic Network: Generalized Neural Policies for Relational MDPs (Sankalp Garg, Aniket Bajpai and Mausam Mausam) (pdf)
[18] Safe Learning of Lifted Action Models (Brendan Juba, Hai Le and Roni Stern) (pdf) (poster)
[19] Reinforcement Learning for Planning Heuristics (Patrick Ferber, Malte Helmert and Joerg Hoffmann) (pdf) (poster)
[20] Bridging the gap between Markowitz planning and deep reinforcement learning (Eric Benhamou, David Saltiel, Sandrine Ungari and Abhishek Mukhopadhyay) (pdf) (poster)
[21] Planning from Pixels in Atari with Learned Symbolic Representations (Frederik Drachmann, Andrea Dittadi and Thomas Bolander) (pdf)
[22] Offline Learning for Planning: A Summary (Giorgio Angelotti, Nicolas Drougard and Caroline Ponzoni Carvalho Chanel) (pdf) (poster)
[23] Real-time Planning as Data-driven Decision-making (Maximilian Fickert, Tianyi Gu, Leonhard Staut, Sai Lekyang, Wheeler Ruml, Joerg Hoffmann and Marek Petrik) (pdf) (poster)


The workshop solicits work at the intersection of the fields of reinforcement learning and planning. We also solicit work solely in one area that can influence advances in the other so long as the connections are clearly articulated in the submission.

Submissions are invited for topics on, but not limited to:

  • Reinforcement learning (model-based, Bayesian, deep, etc.)
  • Model representation and learning for planning
  • Planning using approximated/uncertain (learned) models
  • Monte Carlo planning
  • Learning search heuristics for planner guidance
  • Theoretical aspects of planning and reinforcement learning
  • Reinforcement Learning and planning competition(s)
  • Multi-agent planning and learning
  • Applications of both reinforcement learning and planning

Important Dates

  • Submission deadline: August 3, 2020 (UTC-12 timezone)
  • Notification date: August 26, 2020
  • Camera-ready deadline: September 25, 2020
  • Workshop date: October 22 & 23, 2020

Submission Instructions

We solicit workshop paper submissions relevant to the above call of the following types:

  • Long papers — up to 8 pages + unlimited references / appendices
  • Short papers — up to 4 pages + unlimited references / appendices
  • Extended abstracts — up to 2 pages + unlimited references / appendices

Please format submissions in AAAI style (see instructions in the Author Kit at AAAI, and keep them to at most 9 pages including references. Authors considering submitting to the workshop papers rejected from other conferences, please ensure you do your utmost to address the comments given by the reviewers. Please do not submit papers that are already accepted for the main ICAPS conference to the workshop.

Some accepted long papers will be accepted as contributed talks. All accepted long and short papers and extended abstracts will be given a slot in the poster presentation session. Extended abstracts are intended as brief summaries of already published papers, preliminary work, position papers or challenges that might help bridge the gap.

As the main purpose of this workshop is to solicit discussion, the authors are invited to use the appendix of their submissions for that purpose.

Paper submissions should be made through EasyChair,


Please send your inquiries by email to the organizers.