Feature Reinforcement Learning: Part I. Unstructured MDPs
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II (Hutter, 2009c). The role of POMDPs is also considered there.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Aarts E. H. L. and Lenstra J. K. eds. 1997. Local Search in Combinatorial Optimization. Discrete Mathematics and Optimization. Chichester England: Wiley-Interscience.
Banzhaff W.; Nordin P.; Keller E.; and Francone F. 1998. Genetic Programming. San Francisco CA U.S.A.: Morgan-Kaufmann.
Barron A. R. 1985. Logically Smooth Density Estimation. Ph.D. Dissertation Stanford University.
Berry D. A. and Fristedt B. 1985. Bandit Problems: Sequential Allocation of Experiments. London: Chapman and Hall.
Brafman R. I. and Tennenholtz M. 2002. R-max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research 3:213-231.
Cover T. M. and Thomas J. A. 2006. Elements of Information Theory. Wiley-Intersience 2nd edition.
Dearden R.; Friedman N.; and Andre D. 1999. Model based Bayesian Exploration. In Proc. 15th Conference on Uncertainty in Artificial Intelligence (UAI-99) 150-159.
Duff M. 2002. Optimal Learning: Computational procedures for Bayes-adaptive Markov decision processes. Ph.D. Dissertation Department of Computer Science University of Massachusetts Amherst.
Dzeroski S.; de Raedt L.; and Driessens K. 2001. Relational Reinforcement Learning. Machine Learning 43:7-52.
Fishman G. 2003. Monte Carlo. Springer.
Givan R.; Dean T.; and Greig M. 2003. Equivalence Notions and Model Minimization in Markov Decision Processes. Artificial Intelligence 147(1-2):163-223.
Goertzel B. and Pennachin C. eds. 2007. Artificial General Intelligence. Springer.
Gordon G. 1999. Approximate Solutions to Markov Decision Processes. Ph.D. Dissertation School of Computer Science Carnegie Mellon University Pittsburgh PA.
Grünwald P. D. 2007. The Minimum Description Length Principle. Cambridge: The MIT Press.
Guyon I. and Elisseeff A. eds. 2003. Variable and Feature Selection. JMLR Special Issue: MIT Press.
Hastie T.; Tibshirani R.; and Friedman J. H. 2001. The Elements of Statistical Learning. Springer.
Hutter M. 2005. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Berlin: Springer. 300 pages http://www.hutter1.net/ai/uaibook.htm. http://www.hutter1.net/ai/uaibook.htm
Hutter M. 2007. Universal Algorithmic Intelligence: A Mathematical TopDown Approach. In Artificial General Intelligence. Berlin: Springer. 227-290.
Hutter M. 2009a. Feature Dynamic Bayesian Networks. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09) volume 8 67-73. Atlantis Press.
Hutter M. 2009b. Feature Markov Decision Processes. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09) volume 8 61-66. Atlantis Press.
Hutter M. 2009c. Feature Reinforcement Learning: Part II: Structured MDPs. In progress. Will extend Hutter (2009a).
Kaelbling L. P.; Littman M. L.; and Cassandra A. R. 1998. Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101:99-134.
Kearns M. J. and Singh S. 1998. Near-optimal reinforcement learning in polynomial time. In Proc. 15th International Conf. on Machine Learning 260-268. Morgan Kaufmann San Francisco CA.
Koza J. R. 1992. Genetic Programming. The MIT Press.
Kumar P. R. and Varaiya P. P. 1986. Stochastic Systems: Estimation Identification and Adaptive Control. Englewood Cliffs NJ: Prentice Hall.
Legg S. and Hutter M. 2007. Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17(4):391-444.
Legg S. 2008. Machine Super Intelligence. Ph.D. Dissertation IDSIA Lugano.
Li M. and Vitányi P. M. B. 2008. An Introduction to Kolmogorov Complexity and its Applications. Berlin: Springer 3rd edition.
Liang P. and Jordan M. 2008. An Asymptotic Analysis of Generative Discriminative and Pseudolikelihood Estimators. In Proc. 25th International Conf. on Machine Learning (ICML'08) volume 307 584-591. ACM.
Liu J. S. 2002. Monte Carlo Strategies in Scientific Computing. Springer.
Lusena C.; Goldsmith J.; and Mundhenk M. 2001. Nonapproximability Results for Partially Observable Markov Decision Processes. Journal of Artificial Intelligence Research 14:83-103.
MacKay D. J. C. 2003. Information theory inference and learning algorithms. Cambridge MA: Cambridge University Press.
Madani O.; Hanks S.; and Condon A. 2003. On the Undecidability of Probabilistic Planning and Related Stochastic Optimization Problems. Artificial Intelligence 147:5-34.
McCallum A. K. 1996. Reinforcement Learning with Selective Perception and Hidden State. Ph.D. Dissertation Department of Computer Science University of Rochester.
Ng A. Y.; Coates A.; Diel M.; Ganapathi V.; Schulte J.; Tse B.; Berger E.; and Liang E. 2004. Autonomous Inverted Helicopter Flight via Reinforcement Learning. In ISER volume 21 of Springer Tracts in Advanced Robotics 363-372. Springer.
Pankov S. 2008. A Computational Approximation to the AIXI Model. In Proc. 1st Conference on Artificial General Intelligence volume 171 256-267.
Pearlmutter B. A. 1989. Learning State Space Trajectories in Recurrent Neural Networks. Neural Computation 1(2):263-269.
Poland J. and Hutter M. 2006. Universal Learning of Repeated Matrix Games. In Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn'06) 7-14.
Poupart P.; Vlassis N. A.; Hoey J.; and Regan K. 2006. An Analytic Solution to Discrete Bayesian Reinforcement Learning. In Proc. 23rd International Conf. on Machine Learning (ICML'06) volume 148 697-704. Pittsburgh PA: ACM.
Puterman M. L. 1994. Markov Decision Processes — Discrete Stochastic Dynamic Programming. New York NY: Wiley.
Raedt L. D.; Hammer B.; Hitzler P.; and Maass W. eds. 2008. Recurrent Neural Networks - Models Capacities and Applications volume 08041 of Dagstuhl Seminar Proceedings. IBFI Schloss Dagstuhl Germany.
Ring M. 1994. Continual Learning in Reinforcement Environments. Ph.D. Dissertation University of Texas Austin.
Ross S. and Pineau J. 2008. Model-Based Bayesian Reinforcement Learning in Large Structured Domains. In Proc. 24th Conference in Uncertainty in Artificial Intelligence (UAI'08) 476-483. Helsinki: AUAI Press.
Ross S.; Pineau J.; Paquet S.; and Chaib-draa B. 2008. Online Planning Algorithms for POMDPs. Journal of Artificial Intelligence Research 2008(32):663-704.
Russell S. J. and Norvig P. 2003. Artificial Intelligence. A Modern Approach. Englewood Cliffs NJ: Prentice-Hall 2nd edition.
Sanner S. and Boutilier C. 2009. Practical Solution Techniques for First-Order MDPs. Artificial Intelligence 173(5-6):748-788.
Schmidhuber J. 2004. Optimal Ordered Problem Solver. Machine Learning 54(3):211-254.
Schwarz G. 1978. Estimating the Dimension of a Model. Annals of Statistics 6(2):461-464.
Singh S.; Littman M.; Jong N.; Pardoe D.; and Stone P. 2003. Learning Predictive State Representations. In Proc. 20th International Conference on Machine Learning (ICML'03) 712-719.
Strehl A. L.; Diuk C.; and Littman M. L. 2007. Efficient Structure Learning in Factored-State MDPs. In Proc. 27th AAAI Conference on Artificial Intelligence 645-650. Vancouver BC: AAAI Press.
Sutton R. S. and Barto A. G. 1998. Reinforcement Learning: An Introduction. Cambridge MA: MIT Press.
Szita I. and Lörincz A. 2008. The Many Faces of Optimism: a Unifying Approach. In Proc. 12th International Conference (ICML 2008) volume 307.
Wallace C. S. 2005. Statistical and Inductive Inference by Minimum Message Length. Berlin: Springer.
Willems F. M. J.; Shtarkov Y. M.; and Tjalkens T. J. 1997. Reections on the Prize Paper: The Context-Tree Weighting Method: Basic Properties. IEEE Information Theory Society Newsletter 20-27.
Wolpert D. H. and Macready W. G. 1997. No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation 1(1):67-82.