The problem of autonomous service robot navigation can be decomposed into global navigation planning and local reactive navigation. This paper focuses on the problem of plan execution, that is, how the robot chooses appropriate local navigation actions given a global navigation plan. The policies used for plan execution can have an enormous impact on the robot’s performance.
Specifying and programming plan execution policies can be a tedious task if it has to be done manually, even if the programmer is using a high-level programming language that has been created for this purpose. The solution proposed in this paper is to provide the robot with the ability to autonomously learn its own policies. To reach this goal, the authors formulate the plan execution problem as a Markov decision problem (MDP). This is an innovative application of the MDP framework. Under MDP, the selection of an action is performed by a function defined to maximize the expected local performance, given the global plan and models of the actions supported by the local navigation components. Neural networks and decision/regression trees have been used for learning the action models.
The results of the authors’ experiment show that performance improves substantially when comparing policies learned by the robot with policies that are manually encoded. They also show that the action selection function learned in simulation generalizes well to those to be applied by a real robot, and to tasks different than that used during the training phase.
Another advantage of this approach is that it can be extended and improved in various ways.