Order Estimation of Japanese Paragraphs by Supervised Machine Learning and Various Textual Features

Open access


In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.84) in the order estimation experiments of the first two paragraphs of an article. In addition, it obtained a higher accuracy than the baseline method in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs.

[1] Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. Proceedings of ICDEf95, pages 3-14, 1995.

[2] Danushka Bollegala, Naoaki Okazaki, and Mitsuru Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. Proceedings of the 44th Annual Meeting of the Association of Computational Linguistics, pages 385-392, 2006.

[3] Jaime Carbonell and Jade Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 335-336, 1998.

[4] Nello Cristianini and John Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000.

[5] Fosca Giannotti, Mirco Nanni, and Dino Pedreschi. Efficient mining of temporally annotated sequences. Proceedings of the 2006 SIAM International Conference on Data Mining, pages 348-359, 2006.

[6] Yuya Hayashi, Masaki Murata, Liangliang Fan, and Masato Tokuhisa. Japanese sentence order estimation using supervised machine learning with rich linguistic clues. In Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2013), pages 1-12, 2013.

[7] Nikiforos Karamanis and Hisar Maruli Manurung. Stochastic text structuring using the principle of continuity. In Proceedings of the second International Natural Language Generation Conference (INLGf02), pages 81-88, 2002.

[8] Taku Kudoh. TinySVM: Support Vector Machines. http://cl.aist-nara.ac.jp/taku-ku//software/TinySVM/index.html, 2000.

[9] Mirella Lapata. Probablistic text structuring: Experiments with sentence ordering. Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics, pages 542-552, 2003.

[10] William C. Mann and Sandra A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243-281, 1988.

[11] Yuji Matsumoto, Akira Kitauchi, Tatsuo Yamashita, Yoshitaka Hirano, Hiroshi Matsuda, and Masayuki Asahara. Japanese morphological analysis system ChaSen version 2.0 manual 2nd edition. 1999.

[12] Kathleen R. McKeown, Judith L. Klavans, Vasileios Hatzivassiloglou, Regina Barzilay, and Eleazar Eskin. Towards multidocument summarization by reformulation: Progress and prospects. In Proceedings of AAAI/IAAI, pages 453-460, 1999.

[13] Masaki Murata and Hitoshi Isahara. Automatic detection of mis-spelled Japanese expressions using a new method for automatic extraction of negative examples based on positive examples. IEICE Transactions on Information and Systems, E85- D(9):1416-1424, 2002.

[14] Masaki Murata, Satoshi Ito, Masato Tokuhisa, and Qing Ma. Order estimation of Japanese paragraphs by supervised machine learning. Proceedings of SCIS-ISIS 2014, pages 1096-1101, 2014.

[15] Naoaki Okazaki, Yutaka Matsuo, and Mitsuru Ishizuka. Improving chronological sentence ordering by precedence relation. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 04), pages 750-756, 2004.

[16] Kiyotaka Uchimoto, Masaki Murata, Qing Ma, Satoshi Sekine, and Hitoshi Isahara. Word order acquisition from corpora. In COLING ’2000, pages 871-877, 2000.

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 72 72 22
PDF Downloads 30 30 9