Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Ondřej Vadinský

Open Access

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Ondřej Vadinský

| Mar 07, 2018

Journal of Artificial General Intelligence

Volume 9 (2018): Issue 1 (March 2018)

About this article

Cite

Page range: 1 - 54

Received: Feb 17, 2017

Accepted: Feb 06, 2018

DOI: https://doi.org/10.2478/jagi-2018-0001

Keywords
artificial general intelligence, evaluating intelligence of artificial systems, Universal Intelligence definition, Algorithmic Intelligence Quotient test

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q₀, Q_λ, and HLQ_λ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.

eISSN:: 1946-0163
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence

Journal RSS Feed

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Published Online: Mar 07, 2018

Page range: 1 - 54

Received: Feb 17, 2017

Accepted: Feb 06, 2018

DOI: https://doi.org/10.2478/jagi-2018-0001

Keywords
artificial general intelligence, evaluating intelligence of artificial systems, Universal Intelligence definition, Algorithmic Intelligence Quotient test

© by Ondřej Vadinský

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Towards General Evaluation of Intelligent Systems: Lessons Learned from Reproducing AIQ Test Results

Published Online: Mar 07, 2018

Page range: 1 - 54

Received: Feb 17, 2017

Accepted: Feb 06, 2018

DOI: https://doi.org/10.2478/jagi-2018-0001

Keywordsartificial general intelligence, evaluating intelligence of artificial systems, Universal Intelligence definition, Algorithmic Intelligence Quotient test

© by Ondřej Vadinský

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
artificial general intelligence, evaluating intelligence of artificial systems, Universal Intelligence definition, Algorithmic Intelligence Quotient test