For over three decades, planning has been a key area of research in artificial intelligence. Automated planning and scheduling, or simply AI planning, aims to produce strategies and construct sequences of actions to achieve predefined goals. Planning techniques have been applied in a variety of areas including robotics and spacecraft missions.
In a static environment, plans can be created prior to execution, however, in dynamic environments, strategies need to be adapted in response to the environment, and plans must be obtained through an iterative process such as reinforcement learning in order to enhance the planning process and make it more efficient. In this article, we will describe automated planning as well as artificial intelligence approaches that combine automated planning and reinforcement learning.
This article describes the concept of automated Planning, which is one of the major fields of Artificial Intelligence. AI planning involves the representation of world models and the realization of strategies executed by intelligent agents (planners) that automatically produce plans to realize a set of predefined goals. AI planning can assist humans in practical applications such as design and manufacturing, space exploration, games, and military operations.
We then describe the reinforcement learning model, which is a subcategory of machine learning that consists of learning policies for selecting a suitable action to execute in a given state.
Both automated planning and reinforcement learning involve mechanisms that guide an agent in an environment.
Finally, we discuss how AI planning can be improved with machine learning by presenting some approaches that involve the integration of planning and learning.
Planning is a process that involves finding a sequence of steps (plan) for transforming a given state into a state which fulfills a set of specific tasks.
Automated planning appeared in the late ’50s to solve practical problems involving robotics and automatic deduction . AI planning deals with the development of representation languages and algorithms to construct plans and solve planning problems.
AI planning is extremely complex and requires sophisticated reasoning capabilities, in addition, research in this field focus mainly on improving the efficiency of planning systems either by introducing new languages or by developing new algorithms that will produce better results in less time.
Applications of AI planning include system control applications such as autonomous systems & virtual agents, and process control application such as project planning, workflow management, and design and manufacturing of physical goods.
Planning can be divided into two main categories: classical planning and neoclassical planning.
The classical approach of planning considers environments that are static, deterministic and fully observable, it assumes that the environment only responses to the agent’s actions.
The classical planning includes three categories of planning:
- Planning in state-spaces, where the planning problem can be solved using search. When planning in state-spaces, the planner has a world model represented in a graph, each node represents a world state. The agent can search through this graph using algorithms such as breadth-first, depth-first, and heuristic search algorithms. The search can be forward, where the agent starts from the initial state toward the goal state (progression planning), backward, where the agent starts from the goal state toward the initial state (regression planning), or the search can be bidirectional.
- Planning in plan-spaces: instead of searching through states, this approach searches through a graph of partial plans. Each node in the graph is a partial plan, and each arc is a refinement operation. The algorithms in this approach start with an empty plan and improve it by adding partial plans until a complete plan is obtained.
- Hierarchical planning is an add-on approach to other planning approaches in order to reduce the computational cost of planning. The idea behind it is to divide the problem into various levels of abstraction, in this way, the planner can distinguish between more and less important goals and actions and start by solving the most important ones first. Examples of hierarchical planners include ABSTRIPS and ABTWEAK.
Classical planning consists of searching a space of nodes containing states or partial plans. This can produce large domain representations which are typically encountered in the real-world application. Neoclassical planning techniques, on the other hand, have significantly sped up the planning process. In a neoclassical planning problem, each node represents a set of partial plans, furthermore, not all the actions in these partial plans have to appear in the final plan.
Some of the techniques used in neoclassical planning include planning graphs, Automatic Heuristic Extraction, SAT encodings and Model Checking.
In this section, we will explore some of the AI planning languages. A planning language is a tool used to describe the environment and the desired goals, as well as the chain of actions to achieve these goals.
STRIPS (Stanford Research Institute Planning System) is a classical planning language that was part of the first major planning system. A planning problem in STRIPS is composed of an initial state, a set of goals and a set of available actions. States are represented as sets of atomic facts that contain both static and dynamic information. Actions, on the other hand, include preconditions and postconditions. The preconditions describe the state required to perform the action, while postconditions describe the state after the action is executed. An action cannot be executed unless the state of the world meets its preconditions.
STRIPS was used in the development of Shakey, the first robot to be able to perceive and reason. Shakey was developed at the Stanford Research Institute and could perform tasks that required planning such as manipulation, route finding, and visual analysis. This robot is considered the ancestor of modern robots.
Further developments in representational languages led to the creation of new languages such as PDDL, which stands for Planning Domain Definition Language. PDDL was an attempt to standardize planning languages, it contains several representational languages including STRIPS and ADL.
Reinforcement Learning is a type of Machine Learning, that consists of allowing an agent to learn new behaviors from its interactions with a dynamic environment. This model is inspired by the process of learning by getting feedback from the environment in biological systems (a reward or punishment). Therefore, simple reward feedback is required for the agent to learn its behavior; this scaled quantity that evaluates the agent’s performance is called the reinforcement signal. The agent is not told what actions to take, it is through this system that it will discover what actions are the most rewarding and therefore the best to select by trying them.
Reinforcement Learning does not involve complex reasoning since it is based on continuous interactions with the environment, however, it is not purely reactive since the interaction process is guided by the acquired experience.
COMBINING PLANNING AND REINFORCEMENT LEARNING
Both AI planning and reinforcement learning deal with the same issue of providing an agent with a set of guidelines to get from an initial state to the desired state. However, they use different techniques and approaches.
Planning can become slow and cumbersome especially for real-time complex systems, because of the following reasons:
- The state-space size grows exponentially with the number of variables that the problem must take in consideration such as the duration and time limits of the actions, their concurrency, etc.
- Planning is a hierarchical paradigm in the sense that the agent must first sense the environment, make a model of the environment, search for a plan and then start acting. In a static environment, planning can be done offline and may not be an issue, however, in a dynamic and undefined environment, the agent should periodically sense the world to incorporate any new changes and update the model and the plan accordingly, which can make the agent very slow.
For these reasons, some have considered Reinforcement Learning as an alternative to planning.
While it is true that reinforcement learning algorithms support a model-free control system that can be very effective for dynamic worlds, the interactions with the environment can be expensive, and “the search space may become intractable for exploration” .
Humans and biological systems in general use both model-based and model-free learning mechanisms, in addition, many methodologies have shown that machine learning can assist AI planning. Therefore, combining these systems may result in better and faster guidance to the agent. “The key idea is to equip the agent with two modules, a planner and a learner that cooperate either in a sequential or in a concurrent way” 
There are three different approaches to combining reinforcement learning and planning:
- Plan first then learn: in this approach, planning is used to create micro-operators (general compositions of actions which achieve a goal) that can speed up the learning process.
- First learn then plan: in this approach, reinforcement learning is used for sensing the environment, while a sequence of actions (plans) become useful when sensing becomes difficult.
- Concurrent planning and learning: in this approach learning and planning are interchanged; here, a model of the world is maintained by the agent and the value of the “function estimates is updated by simulating transitions on the world model” . Algorithms in this category include the DYNA family and the Prioritized Sweeping.
In this article, we have given a short overview of AI planning, which is one of the key topics in artificial intelligence. AI Planning has proved to be successful in solving both toy problems as well as real-world applications such as planning space missions. We have also shortly discussed Reinforcement learning which is a subcategory of machine learning and analyzed how automated planning and learning can be combined in order to enhance planning and make it more efficient in dynamic environments. Humans use both reinforcement learning and model-based learning systems, however, the sophistication of a human-like control system has yet to be realized in AI.
 Ghallab, M., Nau, D. S., & Traverso, P. (2016). Automated planning and acting. New York: Cambridge University Press.
 Arregui, S. F., Celorrio, S. J., & Turbides, T. D. (n.d.). Improving Automated Planning with Machine Learning. Machine Learning, 1355-1373. doi:10.4018/978-1-60960-818-7.ch510
 Vrakas, D., & Vlahavas, I. (2008). Artificial intelligence for advanced problem-solving techniques. Hershey, PA: Information Science Reference.
 LaValle, S. M. (2014). Planning algorithms. New York (NY): Cambridge University Press.
 Partalas, I., Vrakas, D., & Vlahavas, I. (n.d.). Reinforcement Learning and Automated Planning. Artificial Intelligence for Advanced Problem Solving Techniques. doi:10.4018/9781599047058.ch006.