Robotics: Science and Systems VI
Closing the Learning-Planning Loop with Predictive State Representations
B. Boots, S. Siddiqi and G. GordonAbstract:
A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must learn an accurate model of our environment, and then plan to maximize reward. Unfortunately, learning algorithms often recover a model which is too inaccurate to support planning or too large and complex for planning to be feasible; or, they require large amounts of prior domain knowledge or fail to provide important guarantees such as statistical consistency. To begin to fill this gap, we propose a novel algorithm which provably learns a compact, accurate model directly from sequences of action-observation pairs. To evaluate the learner, we then close the loop from observations to actions: we plan in the learned model and recover a policy which is nearoptimal in the original environment (not the model). In more detail, we present a spectral algorithm for learning a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then performing approximate point-based planning in the learned model. This experiment shows that the learned PSR captures the essential features of the environment, allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Our algorithm has several benefits which have not appeared together in any previous PSR learner: it is computationally efficient and statistically consistent; it handles high-dimensional observations and long time horizons by working from real-valued features of observation sequences; and finally, our close-the-loop experiments provide an end-to-end practical test.
Bibtex:
@INPROCEEDINGS{ Boots-RSS-10, AUTHOR = {B. Boots AND S. Siddiqi AND G. Gordon}, TITLE = {Closing the Learning-Planning Loop with Predictive State Representations}, BOOKTITLE = {Proceedings of Robotics: Science and Systems}, YEAR = {2010}, ADDRESS = {Zaragoza, Spain}, MONTH = {June}, DOI = {10.15607/RSS.2010.VI.036} }