One of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby’s homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn’t be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Ashby W. R. 1960. Design for a Brain. Springer Science & Business Media.
Barto A. G.; Singh S.; and Chentanez N. 2004. Intrinsically motivated learning of hierarchical collections of skills. In Proc. 3rd Int. Conf. Development Learn 112-119.
Cañamero D. 1997. Modeling motivations and emotions as a basis for intelligent behavior. In Proceedings of the first international conference on Autonomous agents 148-155. ACM.
Dawkins R. 1976. The Selfish Gene. Oxford University Press Oxford UK.
Dayan P. and Hinton G. E. 1996. Varieties of Helmholtz machine. Neural Networks 9(8):1385-1403.
Doya K. and Uchibe E. 2005. The cyber rodent project: Exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior 13(2):149-160.
Elfwing S.; Uchibe E.; Doya K.; and Christensen H. I. 2005. Biologically inspired embodied evolution of survival. In Evolutionary Computation 2005. The 2005 IEEE Congress on volume 3 2210-2216. IEEE.
Hester T. and Stone P. 2012. Learning and using models. In Reinforcement Learning. Springer. 111-141.
Jordan M. I.; Ghahramani Z.; Jaakkola T. S.; and Saul L. K. 1999. An introduction to variational methods for graphical models. Machine learning 37(2):183-233.
Kaelbling L. P.; Littman M. L.; and Moore A. W. 1996. Reinforcement learning: A survey. arXiv preprint cs/9605103.
Kappen H. J.; G´omez V.; and Opper M. 2012. Optimal control as a graphical model inference problem. Machine learning 87(2):159-182.
Keramati M. and Gutkin B. S. 2011. A reinforcement learning theory for homeostatic regulation. In Advances in Neural Information Processing Systems 82-90.
Kingma D. and Ba J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kingma D. P. and Welling M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Konidaris G. and Barto A. 2006. An adaptive robot motivational system. In From Animals to Animats 9. Springer. 346-356.
Lange S.; Riedmiller M.; and Voigtlander A. 2012. Autonomous reinforcement learning on raw visual input data in a real world application. In Neural Networks (IJCNN) The 2012 International Joint Conference on 1-8. IEEE.
Lin L.-J. 1992. Self-improving reactive agents based on reinforcement learning planning and teaching. Machine learning 8(3-4):293-321.
McFarland D. and B¨osser T. 1993. Intelligent behavior in animals and robots. MIT Press.
McFarland D. and Houston A. 1981. Quantitative ethology. Pitman Advanced Pub. Program.
McFarland D. and Spier E. 1997. Basic cycles utility and opportunism in self-sufficient robots. Robotics and Autonomous Systems 20(2):179-190.
Meyer J.-A. and Guillot A. 1991. Simulation of adaptive behavior in animats: Review and prospect. In In J.-A. Meyer and S.W. Wilson (Eds.) From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior 2-14.
Mnih A. and Gregor K. 2014. Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030.
Mnih V.; Kavukcuoglu K.; Silver D.; Rusu A. A.; Veness J.; Bellemare M. G.; Graves A.; Riedmiller M.; Fidjeland A. K.; Ostrovski G.; et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529-533.
Nakamura M. and Yamakawa H. 2016. A Game-Engine-Based Learning Environment Framework for Artificial General Intelligence. In International Conference on Neural Information Processing 351-356. Springer.
Ng A. Y.; Harada D.; and Russell S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML volume 99 278-287.
Ogata T. and Sugano S. 1997. Emergence of Robot Behavior Based on Self-Preservation. Research Methodology and Embodiment of Mechanical System. Journal of the Robotics Society of Japan 15(5):710-721.
Omohundro Stephen M S. M. 2008. The Basic AI Drives. In Artificial General Intelligence 2008: Proceedings of the First AGI Conference volume 171 483. IOS Press.
Pfeifer R. and Scheier C. 1999. Understanding intelligence. MIT press.
Ranganath R.; Gerrish S.; and Blei D. M. 2013. Black box variational inference. arXiv preprint arXiv:1401.0118.
Rawlik K.; Toussaint M.; and Vijayakumar S. 2013. On stochastic optimal control and reinforcement learning by approximate inference. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence 3052-3056. AAAI Press.
Rummery G. A. and Niranjan M. 1994. On-line Q-learning using connectionist systems. University of Cambridge Department of Engineering.
Rusu A. A.; Vecerik M.; Roth¨orl T.; Heess N.; Pascanu R.; and Hadsell R. 2016. Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286.
Sibly R. and McFarland D. 1976. On the fitness of behavior sequences. American Naturalist 601-617.
Spier E. 1997. From reactive behaviour to adaptive behaviour: motivational models for behaviour in animals and robots. Ph.D. Dissertation University of Oxford.
Toda M. 1962. The design of a fungus-eater: A model of human behavior in an unsophisticated environment. Behavioral Science 7(2):164-183.
Toda M. 1982. Man robot and society: Models and speculations. M. Nijhoff Pub.
Todorov E. 2008. General duality between optimal control and estimation. In Decision and Control 2008. CDC 2008. 47th IEEE Conference on 4286-4292. IEEE.
Toussaint M.; Harmeling S.; and Storkey A. 2006. Probabilistic inference for solving (PO) MDPs. Informatics research report 0934 University of Edinburgh.
Toussaint M. 2009. Robot trajectory optimization using approximate inference. In Proceedings of the 26th Annual International Conference on Machine Learning 1049-1056. ACM.
Vlassis N. and Toussaint M. 2009. Model-free reinforcement learning as mixture learning. In Proceedings of the 26th Annual International Conference on Machine Learning 1081-1088. ACM.
Walter W. 1953. The living brain. Norton.
Young J. Z. 1966. The Memory System of the Brain. Oxford University Press.