Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Open access


This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.


  • Besold, T.; Hernández-Orallo, J.; and Schmid, U. 2015. Can Machine Intelligence be Measured in the Same Way as Human intelligence? KI - Künstliche Intelligenz 29(3):291-297.

  • Breiman, L.; Friedman, J. H.; Olsen, R. A.; and Stone, C. J. 1984. Classification and Regression Trees. Belmont: Thomson Wadsworth.

  • Bringsjord, S., and Schimanski, B. 2003. What Is Artificial Intelligence? Psychometric AI as an Answer. In Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI’03), 887-893.

  • de Mey, M. 1992. The Cognitive Paradigm. Chicago and London: University of Chicago Press.

  • Dennett, D. C. 1991. Consciousness Explained. London: Penguin Books.

  • Descartes, R. 1637. A Discourse on Method. Oxford: Oxford University Press.

  • Dowe, D. L., and Hájek, A. R. 1998. A Non-Behavioural, Computational Extension to the Turing Test. In Proceedings of International Conference on Computational Intelligence & Multimedia Applications (ICCIMA’98), Gippsland, Australia, 101-106.

  • Goertzel, B. 2010. Toward a Formal Characterization of Real-World General Intelligence. In Baum, E.; Hutter, M.; and Kitzelmann, E., eds., Proceedings of the 3rd Conference on Artificial General Intelligence, AGI 2010, 19-24. Amsterdam-Beijing-Paris: Atlantis Press.

  • Goertzel, B. 2014. Artificial General Intelligence: Concept, State of the Art, and Future Prospects. Journal of Artificial General Intelligence 5(1):1-48.

  • Harnad, S. 1991. Other Bodies, Other Minds: A Machine Incarnation of an Old Philosophical Problem. Minds and Machines 1(1):43-54.

  • Hernández-Orallo, J., and Dowe, D. L. 2010. Measuring Universal Intelligence: Towards an Anytime Intelligence Test. Artificial Intelligence 174(18):1508-1539.

  • Hernandez-Orallo, J. 2000. Beyond the Turing Test. Journal of Logic, Language and Information 9(4):447-466.

  • Hernández-Orallo, J. 2010. A (hopefully) Unbiased Universal Environment Class for Measuring Intelligence of Biological and Artificial Systems. In Baum, E.; Hutter, M.; and Kitzelmann, E., eds., Proceedings of the 3rd Conference on Artificial General Intelligence, AGI 2010, 182-183. Amsterdam-Beijing-Paris: Atlantis Press.

  • Hernández-Orallo, J. 2015. C-Tests Revisited: Back and Forth with Complexity. In Bieger, J.; Goertzel, B.; and Potapov, A., eds., Proceedings of the 8th Conference on Artificial General Intelligence, AGI 2015, volume 9205 of Lecture notes in artificial intelligence, 272-282. Berlin: Springer.

  • Hernández-Orallo, J. 2017. The Measure of All Minds. Cambridge: Cambridge University Press.

  • Hibbard, B. 2009. Bias and No Free Lunch in Formal Measures of Intelligence. Journal of Artificial General Intelligence 1(1):54-61.

  • Hothorn, T.; Hornik, K.; and Zeileis, A. 2006. Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics 3(15):651-674.

  • Hutter, M., and Legg, S. 2007. Temporal Difference Updating without a Learning Rate. In Platt, J. C.; Koller, D.; Singer, Y.; and Roweis, S. T., eds., Advances in Neural Information Processing Systems 20, 705-712. Curran Associates, Inc.

  • Insa-Cabrera, J.; Dowe, D. L.; Espa˜na-Cubillo, S.; Hernández-Lloreda, M. V.; and Hernández-Orallo, J. 2011. Comparing Humans and AI Agents. In Schmidhuber, J.; Th´orisson, K. R.; and Looks, M., eds., Proceedings of the 4th Conference on Artificial General Intelligence, AGI 2011, volume 6830 of Lecture notes in artificial intelligence, 122-132. Berlin: Springer.

  • Legg, S., and Hutter, M. 2007a. A Collection of Definitions of Intelligence. In Goertzel, B., and Wang, P., eds., Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms, volume 157 of Frontiers in Artificial Intelligence and Applications. Amsterdam: IOS Press. 17-24.

  • Legg, S., and Hutter, M. 2007b. Universal Intelligence: A Definition of Machine Intelligence. Minds and Machines 17(4):391-444.

  • Legg, S., and Veness, J. 2011. AIQ: Algorithmic Intelligence Quotient [source codes]. https: // Accessed: 2017-06-26.

  • Legg, S., and Veness, J. 2013. An Approximation of the Universal Intelligence Measure. In Dowe, D. L., ed., Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, volume 7070 of Lecture Notes in Computer Science. Berlin: Springer. 236-249.

  • Müller, U. 1993. dev/lang/brainfuck-2.lha in Aminet. Accessed: 2017-06-26.

  • Schweizer, P. 2012. The Externalist Foundations of a Truly Total Turing Test. Minds and Machines 22(3):191-212.

  • Searle, J. R. 1980. Minds, Brains, and Programs. Behavioral and Brain Sciences 3(3):417-457.

  • Sun, R. 2007. The Importance of Cognitive Architectures: An Analysis Based on CLARION. Journal of Experimental & Theoretical Artificial Intelligence 19(2):159-193.

  • Turing, A. M. 1950. Computing Machinery and Intelligence. Mind 59(236):433-460.

  • Vadinský, O. 2015. Towards an Artificially Intelligent System: Possibilities of General Evaluation of Hybrid Paradigm. In Besold, T. R.; Lamb, L. C.; Icard, T.; and Miikkulainen, R., eds., Proceedings of the 10th International Workshop on Neural-Symbolic Learning and Reasoning NeSy’15, 23-29. Buenos Aires: IJCAI.

  • Veness, J.; Ng, K. S.; Hutter, M.; Uther, W.; and Silver, D. 2011. A Monte Carlo AIXI Approximation. Journal of Artificial Intelligence Research 40(1):95-142.

  • Watkins, C. 1989. Learning from Delayed Rewards. Ph.D. Dissertation, Kings College, Cambridge, England.