Accelerating progress in Artificial General Intelligence: Choosing a benchmark for natural world interaction

Open access

Accelerating progress in Artificial General Intelligence: Choosing a benchmark for natural world interaction

Measuring progress in the field of Artificial General Intelligence (AGI) can be difficult without commonly accepted methods of evaluation. An AGI benchmark would allow evaluation and comparison of the many computational intelligence algorithms that have been developed. In this paper I propose that a benchmark for natural world interaction would possess seven key characteristics: fitness, breadth, specificity, low cost, simplicity, range, and task focus. I also outline two benchmark examples that meet most of these criteria. In the first, the direction task, a human coach directs a machine to perform a novel task in an unfamiliar environment. The direction task is extremely broad, but may be idealistic. In the second, the AGI battery, AGI candidates are evaluated based on their performance on a collection of more specific tasks. The AGI battery is designed to be appropriate to the capabilities of currently existing systems. Both the direction task and the AGI battery would require further definition before implementing. The paper concludes with a description of a task that might be included in the AGI battery: the search and retrieve task.

Achler, T., and Amir, E. 2009. Neuroscience and AI share the same elegant mathematical trap. In Proc 2009 Conf on Artificial General Intelligence.

Asuncion, A., and Newman, D. 2007. UCI Machine Learning Repository.

AUVSI. 2009. AUVSI Unmanned Systems Online.

Bayer, S.; Damianos, L.; Hirschman, L.; and Strong, G. 2004. A Summary of Previous Grand Challenge Proposals for Cognitive Systems. Technical report, The MITRE Corporation. Version 1.5, Prepared for DARPA IPTO

Brachman, R. J. 2006. (AA)AI more than the sum of its parts. AI Magazine 27(4):19-34.

Carpenter, R., and Freeman, J. 2005. Computing machinery and the individual: The Personal Turing Test. Technical report, Jabberwacky.

Cohen, P. R. 2005. If not Turing's test, then what? AI Magazine 26(4):61-67.

CoroWare Inc. 2007. The CoroWare CoroBot.

DARPA. 2007. DARPA Urban Challenge.

Dillman, R. 2004. KA 1.10 Benchmarks for Robotics Research. Technical report, University of Karlsruhe. Sponsored: European Robotics Research Network.

Duch, W.; Oentaryo, R. J.; and Pasquier, M. 2008. Frontiers in Artificial Intelligence Applications, volume 171. IOS Press. chapter Cognitive architectures: Where do we go from here?, 122-136.

Elio, R., and Pelletier, F. J. 1993. Human benchmarks on AI's benchmark problems. In Proc 15th Congress of the Cognitive Science Society, 406-411.

FIRA. 2009. Federation of International Robosoccer Association Homepage.

Geva, S., and Sitte, J. 1993. A cart-pole experiment for trainable controllers. IEEE Control Systems Magazine 13:40-51.

Goertzel, B., and Pennachin, Eds., C. 2007. Artificial General Intelligence. Springer.

Goertzel, B.; Arel, I.; and Scheutz, M. 2009. Toward a roadmap for human-level artificial general intelligence: Embedding HLAI systems in broad, approachable, physical or virtual contexts. Technical report, Artificial General Intelligence Roadmap Initiative.

Griffin, G.; Holub, A.; and Perona, P. 2007. Caltech-256 Object Category Dataset. Technical Report 7694, California Institute of Technology.

Harnad, S. 1991. Other bodies, other minds: A machine incarnation of an old philisophical problem. Minds and Machines 1:43-54.

Hutter, M. 2005. Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Berlin Heidelberg: Springer-Verlag.

Kennedy, J. F. 1961. Man on the Moon Address.

Kokinov, B. N. 1994. The DUAL cognitive architechture: A hybrid multi-agent approach. In Proceedings of the Eleventh European Conference on Artificial Intelligence. John Wiley and Sons.

Laird, J. E.; Wray III, R. E.; Marinier III, R. P.; and Langley, P. 2009. Claims and challenges in evaluating human-level intelligent systems. In Proceedings of the 2009 Conference on Artificial General Intelligence. Atlantis Press.

Lebiere, C.; Gonzales, C.; and Warwick, W. 2009. A comparative approach to understanding general intelligence: Predicting cognitive performance in an open-ended dynamic task. In Proceedings of the Second Conference on Artificial General Intelligence. Atlantis Press.

Livingston, S., and Arel, I. 2009. AGI Roadmap.

Michel, O.; Rohrer, F.; and van Bourquin, Y. 2008. Rat's Life: A cognitive robotics benchmark. In et al., H. B., ed., Proc 2008 European Robotics Sympoiusm, volume STAR 44, 223-232. Berlin Heidelberg: Springer-Verlag.

Mlodinow, L. 2008. The Drunkard's Walk: How Randomness Rules Our Lives, 8th Printing Edition. Pantheon. See Chapter 1.

Moore, A. 1990. Efficient Memory-Based Learning for Robot Control. Ph.D. Dissertation, University of Cambridge.

Mueller, S. T., and Minnery, B. S. 2008. Adapting the Turing Test for embodied neurocognitive evaluation of biologically-inspired cognitive agents. In Proc. 2008 AAAI Fall Symposium on Biologically Inspired Cognitive Architectures.

Netflix. 2009. Netflix Prize Homepage.

Nilsson, N. J. 1995. Eye on the prize. AI Magazine 16(2):9-17.

Schmidhuber, J. 2004. Optimal ordered problem solver. Machine Learning 54:211-254.

Schmidhuber, J. 2009. Ultimate cognition à la Gödel. Cognitive Computing 1:177-193.

The LEGO Group. 2009. The Official Web Site of LEGO(R) Products.

The RoboCup Federation. 2009a. RoboCup Homepage.

The RoboCup Federation. 2009b. RoboCupHome Homepage.

Tino, P.;. Hammer, B.; and Bodén, M. 2007. Perspectives of Neural-Symbolic Integration, volume 77. Heidelberg, Germany: Springer-Verlag. chapter 5. Markovian bias of neural-based architectures with feedback connections, 95-133.

Turing, A. M. 1950. Computing machinery and intelligence. Mind 59:433-460.

Wang, P. 2008a. Editorial: What makes JAGI special. Journal of Artificial General Intelligence 1:1-2.

Wang, P. 2008b. Frontiers in Artificial Intelligence Applications, volume 171. IOS Press. chapter What do you mean by AI?, 362-373.

Weng, J. 2009. Task muddiness, intelligence metrics, and the necessity of autonomous mental development. Minds and Machines 19:93-115.

Wray, R., and Lebiere, C. 2007. Metrics for cognitive architecture evaluation. In Proceedings of the AAAI-07 Workshop on Evaluating Architectures for Intelligence.