One of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby’s homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn’t be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Ashby, W. R. 1960. Design for a Brain. Springer Science & Business Media.
Barto, A. G.; Singh, S.; and Chentanez, N. 2004. Intrinsically motivated learning of hierarchical collections of skills. In Proc. 3rd Int. Conf. Development Learn, 112-119.
Cañamero, D. 1997. Modeling motivations and emotions as a basis for intelligent behavior. In Proceedings of the first international conference on Autonomous agents, 148-155. ACM.
Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford, UK.
Dayan, P., and Hinton, G. E. 1996. Varieties of Helmholtz machine. Neural Networks 9(8):1385-1403.
Doya, K., and Uchibe, E. 2005. The cyber rodent project: Exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior 13(2):149-160.
Elfwing, S.; Uchibe, E.; Doya, K.; and Christensen, H. I. 2005. Biologically inspired embodied evolution of survival. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 3, 2210-2216. IEEE.
Hester, T., and Stone, P. 2012. Learning and using models. In Reinforcement Learning. Springer. 111-141.
Jordan, M. I.; Ghahramani, Z.; Jaakkola, T. S.; and Saul, L. K. 1999. An introduction to variational methods for graphical models. Machine learning 37(2):183-233.
Kaelbling, L. P.; Littman, M. L.; and Moore, A. W. 1996. Reinforcement learning: A survey. arXiv preprint cs/9605103.
Kappen, H. J.; G´omez, V.; and Opper, M. 2012. Optimal control as a graphical model inference problem. Machine learning 87(2):159-182.
Keramati, M., and Gutkin, B. S. 2011. A reinforcement learning theory for homeostatic regulation. In Advances in Neural Information Processing Systems, 82-90.
Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kingma, D. P., and Welling, M. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Konidaris, G., and Barto, A. 2006. An adaptive robot motivational system. In From Animals to Animats 9. Springer. 346-356.
Lange, S.; Riedmiller, M.; and Voigtlander, A. 2012. Autonomous reinforcement learning on raw visual input data in a real world application. In Neural Networks (IJCNN), The 2012 International Joint Conference on, 1-8. IEEE.
Lin, L.-J. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8(3-4):293-321.
McFarland, D., and B¨osser, T. 1993. Intelligent behavior in animals and robots. MIT Press.
McFarland, D., and Houston, A. 1981. Quantitative ethology. Pitman Advanced Pub. Program.
McFarland, D., and Spier, E. 1997. Basic cycles, utility and opportunism in self-sufficient robots. Robotics and Autonomous Systems 20(2):179-190.
Meyer, J.-A., and Guillot, A. 1991. Simulation of adaptive behavior in animats: Review and prospect. In In J.-A. Meyer and S.W. Wilson (Eds.) From Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior, 2-14.
Mnih, A., and Gregor, K. 2014. Neural variational inference and learning in belief networks. arXiv preprint arXiv:1402.0030.
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529-533.
Nakamura, M., and Yamakawa, H. 2016. A Game-Engine-Based Learning Environment Framework for Artificial General Intelligence. In International Conference on Neural Information Processing, 351-356. Springer.
Ng, A. Y.; Harada, D.; and Russell, S. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, 278-287.
Ogata, T., and Sugano, S. 1997. Emergence of Robot Behavior Based on Self-Preservation. Research Methodology and Embodiment of Mechanical System. Journal of the Robotics Society of Japan 15(5):710-721.
Omohundro, Stephen M, S. M. 2008. The Basic AI Drives. In Artificial General Intelligence, 2008: Proceedings of the First AGI Conference, volume 171, 483. IOS Press.
Pfeifer, R., and Scheier, C. 1999. Understanding intelligence. MIT press.
Ranganath, R.; Gerrish, S.; and Blei, D. M. 2013. Black box variational inference. arXiv preprint arXiv:1401.0118.
Rawlik, K.; Toussaint, M.; and Vijayakumar, S. 2013. On stochastic optimal control and reinforcement learning by approximate inference. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, 3052-3056. AAAI Press.
Rummery, G. A., and Niranjan, M. 1994. On-line Q-learning using connectionist systems. University of Cambridge, Department of Engineering.
Rusu, A. A.; Vecerik, M.; Roth¨orl, T.; Heess, N.; Pascanu, R.; and Hadsell, R. 2016. Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286.
Sibly, R., and McFarland, D. 1976. On the fitness of behavior sequences. American Naturalist 601-617.
Spier, E. 1997. From reactive behaviour to adaptive behaviour: motivational models for behaviour in animals and robots. Ph.D. Dissertation, University of Oxford.
Toda, M. 1962. The design of a fungus-eater: A model of human behavior in an unsophisticated environment. Behavioral Science 7(2):164-183.
Toda, M. 1982. Man, robot, and society: Models and speculations. M. Nijhoff Pub.
Todorov, E. 2008. General duality between optimal control and estimation. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, 4286-4292. IEEE.
Toussaint, M.; Harmeling, S.; and Storkey, A. 2006. Probabilistic inference for solving (PO) MDPs. Informatics research report 0934, University of Edinburgh.
Toussaint, M. 2009. Robot trajectory optimization using approximate inference. In Proceedings of the 26th Annual International Conference on Machine Learning, 1049-1056. ACM.
Vlassis, N., and Toussaint, M. 2009. Model-free reinforcement learning as mixture learning. In Proceedings of the 26th Annual International Conference on Machine Learning, 1081-1088. ACM.
Walter, W. 1953. The living brain. Norton.
Young, J. Z. 1966. The Memory System of the Brain. Oxford University Press.