Resources for Indonesian Sentiment Analysis

Open access

Abstract

In this work, we present subjectivity lexicons of positive and negative expressions for Indonesian language created by automatically translating English lexicons. Other variations are created by intersecting or unioning them. We compare the lexicons in the task of predicting sentence polarity on a set of 446 manually annotated sentences and we also contrast the generic lexicons with a small lexicon extracted directly from the annotated sentences (in a cross-validation setting). We seek for further improvements by assigning weights to lexicon entries and by wrapping the prediction into a machine learning task with a small number of additional features. We observe that lexicons are able to reach high recall but suffer from low precision when predicting whether a sentence is evaluative (positive or negative) or not (neutral). Weighting the lexicons can improve either the recall or the precision but with a comparable decrease in the other measure.

Baccianella, S., A. Esuli, and F. Sebastiani. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), Valletta, Malta, May. European Language Resources Association (ELRA), 2010.

Bakliwal, Akshat, Piyush Arora, and Vasudeva Varma. Hindi subjective lexicon: A lexical resource for hindi adjective polarity classification. In Chair), Nicoletta Calzolari (Conference, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, may 2012. European Language Resources Association (ELRA). ISBN 978-2-9517408-7-7.

Banea, C., R. Mihalcea, and J. Wiebe. A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In Proceedings of LREC, 2008.

Das, Sanjiv and Mike Chen. Yahoo! for amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), volume 35, page 43, 2001.

Gabrielatos, Costas and Anna Marchi. Keyness: Matching metrics to definitions. Theoreticalmethodological challenges in corpus approaches to discourse studies-and some ways of addressing them, 2011.

Hu, M. and B. Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 168-177. ACM, 2004.

Kaji, Nobuhiro and Masaru Kitsuregawa. Building lexicon for sentiment analysis from massive collection of html documents. In Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pages 1075-1083, 2007.

Maks, Isa and Piek Vossen. Building a fine-grained subjectivity lexicon from a web corpus. In Chair), Nicoletta Calzolari (Conference, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, may 2012. European Language Resources Association (ELRA). ISBN 978-2-9517408-7-7.

Miller, George A. Wordnet: a lexical database for English. Communications of the ACM, 38(11): 39-41, 1995.

Pérez-Rosas, V., C. Banea, and R. Mihalcea. Learning sentiment lexicons in spanish. In Proc. of the 8th International Conference on Language Resources and Evaluation (LREC’12), 2012.

Smedt, T.D. and W. Daelemans. “vreselijk mooi!” (terribly beautiful): A subjectivity lexicon for Dutch adjectives. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC12), 2012.

Wilson, T., J. Wiebe, and P. Hoffmann. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of HLT/EMNLP 2005, 2005.

The Prague Bulletin of Mathematical Linguistics

The Journal of Charles University

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 394 350 37
PDF Downloads 126 115 9