Supposed Maximum Mutual Information for Improving Generalization and Interpretation of Multi-Layered Neural Networks

Ryotaro Kamimura 1
  • 1 IT Education Center, Tokai University, Hiratsuka, Japan


The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] R. Kamimura, Mutual information maximization for improving and interpreting multi-layered neural network, in Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI) (SSCI 2017), 2017.

  • [2] R. Linsker, Self-organization in a perceptual network, Computer, vol. 21, no. 3, pp. 105–117, 1988.

  • [3] R. Linsker, How to generate ordered maps by maximizing the mutual information between input and output signals, Neural computation, vol. 1, no. 3, pp. 402–411, 1989.

  • [4] R. Linsker, Local synaptic learning rules suffice to maximize mutual information in a linear network, Neural Computation, vol. 4, no. 5, pp. 691–702, 1992.

  • [5] R. Linsker, Improved local learning rule for information maximization and related applications, Neural networks, vol. 18, no. 3, pp. 261–265, 2005.

  • [6] R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on neural networks, vol. 5, no. 4, pp. 537–550, 1994.

  • [7] S. Becker, Mutual information maximization: models of cortical self-organization, Network: Computation in Neural Systems, vol. 7, pp. 7–31, 1996.

  • [8] G. Deco, W. Finnoff, and H. Zimmermann, Unsupervised mutual information criterion for elimination of overtraining in supervised multilayer networks, Neural Computation, vol. 7, no. 1, pp. 86–107, 1995.

  • [9] G. Deco and D. Obradovic, An information-theoretic approach to neural computing. Springer Science & Business Media, 2012.

  • [10] J. C. Principe, D. Xu, and J. Fisher, Information theoretic learning, Unsupervised adaptive filtering, vol. 1, pp. 265–319, 2000.

  • [11] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives, Springer Science & Business Media, 2010.

  • [12] P. A. Estévez, M. Tesmer, C. A. Perez, and J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on Neural Networks, vol. 20, no. 2, pp. 189–201, 2009.

  • [13] P. Comon, Independent component analysis, Higher-Order Statistics, pp. 29–38, 1992.

  • [14] A. J. Bell and T. J. Sejnowski, The independent components of natural scenes are edge filters, Vision research, vol. 37, no. 23, pp. 3327–3338, 1997.

  • [15] A. Hyvärinen and E. Oja, Independent component analysis: algorithms and applications, Neural networks, vol. 13, no. 4, pp. 411–430, 2000.

  • [16] P. Comon, Independent component analysis: a new concept, Signal Processing, vol. 36, pp. 287–314, 1994.

  • [17] A. Bell and T. J. Sejnowski, An information-maximization approach to blind separation and blind deconvolution, Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.

  • [18] J. Karhunen, A. Hyvarinen, R. Vigário, J. Hurri, and E. Oja, Applications of neural blind separation to signal and image processing, in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 1, pp. 131–134, IEEE, 1997.

  • [19] H. B. Barlow, Unsupervised learning, Neural computation, vol. 1, no. 3, pp. 295–311, 1989.

  • [20] H. B. Barlow, T. P. Kaushal, and G. J. Mitchison, Finding minimum entropy codes, Neural Computation, vol. 1, no. 3, pp. 412–423, 1989.

  • [21] R. Kamimura, Simple and stable internal representation by potential mutual information maximization, in International Conference on Engineering Applications of Neural Networks, pp. 309–316, Springer, 2016.

  • [22] R. Kamimura, Self-organizing selective potentiality learning to detect important input neurons, in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, pp. 1619–1626, IEEE, 2015.

  • [23] R. Kamimura, Collective interpretation and potential joint information maximization, in Intelligent Information Processing VIII: 9th IFIP TC 12 International Conference, IIP 2016, Melbourne, VIC, Australia, November 18-21, 2016, Proceedings 9, pp. 12–21, 2016. Springer.

  • [24] R. Kamimura, Repeated potentiality assimilation: simplifying learning procedures by positive, independent and indirect operation for improving generalization and interpretation, in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 803–810, IEEE, 2016.

  • [25] R. Kamimura, Collective mutual information maximization to unify passive and positive approaches for improving interpretation and generalization, Neural Networks, vol. 90, pp. 56–71, 2017.

  • [26] R. Kamimura, Direct potentiality assimilation for improving multi-layered neural networks, in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, pp. 19–23, 2017.

  • [27] R. Andrews, J. Diederich, and A. B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems, vol. 8, no. 6, pp. 373–389, 1995.

  • [28] J. M. Benítez, J. L. Castro, and I. Requena, Are artificial neural networks black boxes?, IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1156–1164, 1997.

  • [29] M. Ishikawa, Rule extraction by successive regularization, Neural Networks, vol. 13, no. 10, pp. 1171–1183, 2000.

  • [30] T. Q. Huynh and J. A. Reggia, Guiding hidden layer representations for improved rule extraction from neural networks, IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 264–275, 2011.

  • [31] B. Mak and T. Munakata, Rule extraction from expert heuristics: a comparative study of rough sets with neural network and ID3, European journal of Operational Research, vol. 136, pp. 212–229, 2002.

  • [32] J. Yosinski, J. Clune, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, in In ICML Workshop on Deep Learning, Citeseer, 2015.

  • [33] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, University of Montreal, vol. 1341, 2009.

  • [34] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol. 61, pp. 85–117, 2015.

  • [35] M. G. Cardoso, Logical discriminant models, in Quantitative Modelling In Marketing And Management, pp. 223–253, World Scientific, 2013.


Journal + Issues