Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Asuncion, A. and Newman, D. (2007). UCI machine learning repository, http://www.ics.uci.edu/mlearn/mlrepository.
Campos, M., Fernandez-Luna, Gamez, A. and Puerta, M. (2002). Ant colony optimization for learning Bayesian networks, International Journal of Approximate Reasoning 31(3): 291-311.
Chang, C. and Lin, C. (2001). LIBSVM: A library for support vector machines, http://www.csie.ntu.edu.tw/cjlin/libsvm.
Chickering, D.M. (1996). Learning Bayesian networks is NP-complete, in D. Fisher and H. Lenz (Eds.), Artificial Intelligence and Statistics, Springer-Verlag, Berlin/Heidelberg, pp. 121-130.
Crawford, E., Kay, J. and Eric, M. (2002). The intelligent email sorter, Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, pp. 83-90.
Domingos, P. and Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier, Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 105-112.
Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss, Machine Learning (29): 103-130.
Dougherty, J., Kohavi, R. and Sahami, M. (1995). Supervised and unsupervised discretization of continuous features, Proceedings of the 12th International Conference on Machine Learning, San Francisco, CA, USA, pp. 194-202.
Fayyad, U.M. and Irani, K. (1993). On the handling of continuous-valued attributes in decision tree generation, Machine Learning 8: 87-102.
Friedman, N., Geiger, D. and Goldszmidti, M. (1997). Bayesian network classifiers, Machine Learning 29(2): 131-163.
Heckerman, D., Chickering, D. and Meek, C. (2004). Large sample learning of Bayesian networks is NP-hard, Journal of Machine Learning Research 5: 1287-1330.
Kononenko, I. (2001). Machine learning for medical diagnosis: History, state of the art and perspective, Artificial Intelligence in Medicine 23: 89-109.
Langley, P., Iba, W. and Thompson, K. (1992). An analysis of Bayesian classifiers, 10th International Conference on Artificial Intelligence, San Jose, CA, USA, pp. 223-228.
Miyahara, K. and Pazzani, M.J. (2000). Collaborative filtering with the simple Bayesian classifier, Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence, Melbourne, Australia, pp. 679-689.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Fransisco, CA.
Polanska, J., Borys, D. and Polanski, A. (2006). Node assignment problem in Bayesian networks, International Journal of Applied Mathematics and Computer Science 16(2): 233-240.
Taheri, S. and Mammadov, M. (2012). Structure learning of Bayesian networks using a new unrestricted dependency algorithm, IMMM 2012: The 2nd International Conference on Advances in Information on Mining and Management, Venice, Italy, pp. 54-59.
Taheri, S., Mammadov, M. and Bagirov, A. (2011). Improving naive Bayes classifier using conditional probabilities, 9th Australasian Data Mining Conference, Ballarat, Australia, pp. 63-68.
Taheri, S., Mammadov, M. and Seifollahi, S. (2012). Globally convergent algorithms for solving unconstrained optimization problems, Optimization: 1-15.
Tóth, L., Kocsor, A. and Csirik, J. (2005). On naive Bayes in speech recognition, International Journal of AppliedMathematics and Computer Science 15(2): 287-294.
Wu, X., Vipin Kumar, J., Quinlan, R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, J., Ng, A., Liu, B., Yu, P. S., Zhou, Z., Steinbach, M., Hand, D. J. and Steinberg, D. (2008). Top 10 algorithms in data mining, Knowledge and Information Systems 14: 1-37.
Yatsko, A., Bagirov, A.M. and Stranieri, A. (2011). On the discretization of continuous features for classification, Proceedings of the 9th Australasian Data Mining Conference (AusDM 2011), Ballarat, Australia, Vol. 125.
Zaidi, A., Ould Bouamama, B. and Tagina, M. (2012). Bayesian reliability models of Weibull systems: State of the art, International Journal of Applied Mathematics and Computer Science 22(3): 585-600, DOI: 10.2478/v10006-012-0045-2.
Zupan, B., Demsar, J., Kattan, M.W., Ohori, M., Graefen, M., Bohanec, M. and Beck, J.R. (2001). Orange and decisions-at-hand: Bridging predictive data mining and decision support, Proceedings of the ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, Freiburg, Germany, pp. 151-162.