Learning and decision-making in artificial animals

Open access

Abstract

A computational model for artificial animals (animats) interacting with real or artificial ecosystems is presented. All animats use the same mechanisms for learning and decisionmaking. Each animat has its own set of needs and its own memory structure that undergoes continuous development and constitutes the basis for decision-making. The decision-making mechanism aims at keeping the needs of the animat as satisfied as possible for as long as possible. Reward and punishment are defined in terms of changes to the level of need satisfaction. The learning mechanisms are driven by prediction error relating to reward and punishment and are of two kinds: multi-objective local Q-learning and structural learning that alter the architecture of the memory structures by adding and removing nodes. The animat model has the following key properties: (1) autonomy: it operates in a fully automatic fashion, without any need for interaction with human engineers. In particular, it does not depend on human engineers to provide goals, tasks, or seed knowledge. Still, it can operate either with or without human interaction; (2) generality: it uses the same learning and decision-making mechanisms in all environments, e.g. desert environments and forest environments and for all animats, e.g. frog animats and bee animats; and (3) adequacy: it is able to learn basic forms of animal skills such as eating, drinking, locomotion, and navigation. Eight experiments are presented. The results obtained indicate that (i) dynamic memory structures are strictly more powerful than static; (ii) it is possible to use a fixed generic design to model basic cognitive processes of a wide range of animals and environments; and (iii) the animat framework enables a uniform and gradual approach to AGI, by successively taking on more challenging problems in the form of broader and more complex classes of environments

Adams, S. S., and Burbeck, S. 2012. Beyond the Octopus: From General Intelligence toward a Human-like Mind. In Theoretical Foundations of Artificial General Intelligence. Springer. 49-65.

Avila-García, O., and Cañamero, L. 2005. Hormonal modulation of perception in motivation-based action selection architectures. In Procs of the Symposium on Agents that Want and Like. SSAISB.

Bach, J. 2009. Principles of synthetic intelligence. Oxford University Press.

Bach, J. 2015. Modeling motivation in MicroPsi 2. In AGI 2015 Conference Proceedings, 3-13. Springer.

Bear, M. F.; Connors, B. W.; and Paradiso, M. A. 2015. Neuroscience. Wolters Kluwer.

Bolker, B. M. 2008. Ecological models and data in R. Princeton University Press.

Bostrom, N. 2014. Superintelligence: Paths, Dangers, Strategies. Oxford University Press.

Bouneffouf, D.; Rish, I.; and Cecchi, G. A. 2017. Bandit Models of Human Behavior: Reward Processing in Mental Disorders. In AGI 2017 Conference Proceedings, 237-248. Springer.

Buro, M. 1998. From simple features to sophisticated evaluation functions. In International Conference on Computers and Games, 126-145. Springer.

Caswell, H. 2001. Matrix population models. Wiley Online Library. Avaliable at https://www.sinauer.com/media.

Christensen, V., and Walters, C. J. 2004. Ecopath with Ecosim: methods, capabilities and limitations. Ecological modelling 172(2-4):109-139.

Dörner, D. 2001. Bauplan für eine Seele. Rororo. Rowohlt-Taschenbuch-Verlag.

Draganski, B., and May, A. 2008. Training-induced structural changes in the adult human brain. Behavioural brain research 192(1):137-142.

Fahlman, S. E., and Lebiere, C. 1990. The cascade-correlation learning architecture. In Advances in neural information processing systems, 524-532.

Goertzel, B.; Pennachin, C.; and Geisweiller, N. 2014. The OpenCog Framework. In Engineering General Intelligence, Part 2. Springer. 3-29.

Goodfellow, I.; Bengio, Y.; and Courville, A. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Hammer, P.; Lofthouse, T.; and Wang, P. 2016. The OpenNARS implementation of the non-axiomatic reasoning system. In AGI 2016 Conference Proceedings. Springer. 160-170.

Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735-1780.

Insa-Cabrera, J. 2016. Towards a Universal Test of Social Intelligence. Ph.D. Dissertation, Universitat Politècnica de València, Valencia, Spain.

Johnson, M.; Hofmann, K.; Hutton, T.; and Bignell, D. 2016. The Malmo platform for artificial intelligence experimentation. In International joint conference on artificial intelligence (IJCAI), 4246.

Jonsson, A., and Barto, A. G. 2001. Automated state abstraction for options using the U-tree algorithm. In Advances in neural information processing systems, 1054-1060.

Keramati, M., and Gutkin, B. S. 2011. A reinforcement learning theory for homeostatic regulation. In Advances in neural information processing systems, 82-90.

Langton, C. G. 1997. Artificial life: An overview. MIT Press.

LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. nature 521(7553):436.

Lindgren, K., and Verendel, V. 2013. Evolutionary Exploration of the Finitely Repeated Prisoners’ Dilemma-The Effect of Out-of-Equilibrium Play. Games 4(1):1-20.

Mitchell, T. M. 1978. Version spaces: an approach to concept learning. Technical report, STANFORD UNIV, CALIF, DEPT OF COMPUTER SCIENCE.

Niv, Y. 2009. Reinforcement learning in the brain. Journal of Mathematical Psychology 53(3):139-154.

Nivel, E.; Thórisson, K. R.; Steunebrink, B. R.; Dindo, H.; Pezzulo, G.; Rodriguez, M.; Hernandez, C.; Ognibene, D.; Schmidhuber, J.; Sanz, R.; et al. 2013. Bounded recursive self-improvement. arXiv preprint arXiv:1312.6764.

Nusser, S. 2009. Robust Learning in Safety-Related Domains. Machine Learning Methods for Solving Safety-Related Application Problems, Otto-von-Guericke-Universität Magdeburg.

Roijers, D. M.; Vamplew, P.; Whiteson, S.; Dazeley, R.; et al. 2013. A Survey of Multi- Objective Sequential Decision-Making. J. Artif. Intell. Res.(JAIR) 48:67-113.

Rooney, N. J., and Cowan, S. 2011. Training methods and owner-dog interactions: Links with dog behaviour and learning ability. Applied Animal Behaviour Science 132(3):169-177.

Russell, S. J., and Zimdars, A. 2003. Q-decomposition for reinforcement learning agents. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), 656-663.

Rusu, A. A.; Rabinowitz, N. C.; Desjardins, G.; Soyer, H.; Kirkpatrick, J.; Kavukcuoglu, K.; Pascanu, R.; and Hadsell, R. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671.

Schmidhuber, J. 2015. Deep Learning in Neural Networks: An Overview. Neural Networks 61:85-117.

Strannegård, C., and Nizamani, A. R. 2016. Integrating Symbolic and Sub-symbolic Reasoning. In AGI 2016 Conference Proceedings, 171-180. Springer.

Strannegård, C.; Nizamani, A. R.; Juel, J.; and Persson, U. 2016. Learning and Reasoning in Unknown Domains. Journal of Artificial General Intelligence 7(1):104-127.

Sutton, R. S., and Barto, A. G. 1998. Reinforcement learning: An introduction. MIT press.

Taylor, J.; Yudkowsky, E.; LaVictoire, P.; and Critch, A. 2016. Alignment for advanced machine learning systems. Machine Intelligence Research Institute.

Thórisson, K. R. 2012. A new constructivist AI: from manual methods to self-constructive systems. In Theoretical Foundations of Artificial General Intelligence. Springer. 145-171.

Tuci, E.; Giagkos, A.; Wilson, M.; and Hallam, J., eds. 2016. From Animals to Animats. 1st International Conference on the Simulation of Adaptive Behavior. Springer.

Von Glasersfeld, E. 1995. Radical Constructivism: A Way of Knowing and Learning. Studies in Mathematics Education Series: 6. ERIC.

Wang, P., and Hammer, P. 2015. Assumptions of Decision-Making Models in AGI. In AGI 2015 Conference Proceedings. Springer. 197-207.

Watkins, C. J. C. H. 1989. Learning from delayed rewards. Ph.D. Dissertation, King’s College, Cambridge.

Wilson, S. W. 1986. Knowledge growth in an artificial animal. In Adaptive and Learning Systems. Springer. 255-264.

Wilson, S. W. 1991. The animat path to AI. In Meyer, J. A., and Wilson, S. W., eds., From animals to animats: Proceedings of the First International Conference on Simulation of Adaptive Behavior.

Wolfe, N.; Sharma, A.; Drude, L.; and Raj, B. 2017. The Incredible Shrinking Neural Network: New Perspectives on Learning Representations Through The Lens of Pruning. arXiv preprint arXiv:1701.04465.

Yoshida, N. 2017. Homeostatic Agent for General Environment. Journal of Artificial General Intelligence 8(1).

Zaremba, W., and Sutskever, I. 2015. Reinforcement learning neural turing machinesrevised. arXiv preprint arXiv:1505.00521.