Annotation of the Evaluative Language in a Dependency Treebank

Jana Šindlerová 1
  • 1 Faculty of Mathematics and Physics, Charles University, Prague

Abstract

In the paper, we present our efforts to annotate evaluative language in the Prague Dependency Treebank 2.0. The project is a follow-up of the series of annotations of small plaintext corpora. It uses automatic identification of potentially evaluative nodes through mapping a Czech subjectivity lexicon to syntactically annotated data. These nodes are then manually checked by an annotator and either dismissed as standing in a non-evaluative context, or confirmed as evaluative. In the latter case, information about the polarity orientation, the source and target of evaluation is added by the annotator. The annotations unveiled several advantages and disadvantages of the chosen framework. The advantages involve more structured and easy-to-handle environment for the annotator, visibility of syntactic patterning of the evaluative state, effective solving of discontinuous structures or a new perspective on the influence of good/bad news. The disadvantages include little capability of treating cases with evaluation spread among more syntactically connected nodes at once, little capability of treating metaphorical expressions, or disregarding the effects of negation and intensification in the current scheme.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Bojar, O. and Žabokrtský, Z. (2006). CzEng: Czech-English Parallel Corpus release version 0.5. Prague Bulletin of Mathematical Linguistics, 86:59–62.

  • [2] Choi, Y. and Cardie, C. (2008). Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 793–801, Association for Computational Linguistics, Honolulu, Hawaii.

  • [3] Hajič, J. (2005). Complex corpus annotation: The Prague dependency treebank. Insight into Slovak and Czech Corpus Linguistics, pages 54–73, Veda, Bratislava.

  • [4] Moilanen, K. and Pulman, S. (2007). Sentiment composition. In: Proceedings of RANLP, pages 378–382, Borovets, Bulgaria.

  • [5] Nakagawa, T., Kentaro, I., and Kurohashi, S. (2010). Dependency tree-based sentiment classification using CRFs with hidden variables. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Los Angeles, California, USA.

  • [6] Rentoumi, V., Petrakis, S., Klenner, M., Vouros, G. A., and Karkaletsis, V. (2010). United we Stand: Improving Sentiment Analysis by Joining Machine Learning and Rule Based Methods. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC), pages 1089–1094, Valletta, Malta.

  • [7] Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP), Vol. 1631, pages 1642–1653.

  • [8] Šindlerová, J., Veselovská, K. and Hajič, J. jr. (2014). Tracing Sentiments: Syntactic and Semantic Features in a Subjectivity Lexicon. In Proceedings of the 16th EURALEX International Congress, pages 405–413, Bolzano/Bozen, Italy.

  • [9] Veselovská, K., Hajič, J. jr., and Šindlerová, J. (2012). Creating annotated resources for polarity classification in Czech. In Proceedings of KONVENS 2012 (PATHOS 2012 Workshop), pages 296–304.

  • [10] Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the conference on human language technology and empirical methods in natural language processing, pages 347–354, Association for Computational Linguistics, Vancouver, British Columbia, Canada.

OPEN ACCESS

Journal + Issues

Search