Literal Occurrences of Multiword Expressions: Rare Birds That Cause a Stir

Open access

Abstract

Multiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one’s influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank. We propose heuristics to automatically pre-identify candidate sentences that might contain literal occurrences of verbal VMWEs, and we apply them to existing treebanks in five typologically different languages: Basque, German, Greek, Polish and Portuguese. We also perform a linguistic study of the literal occurrences extracted by the different heuristics. The results suggest that literal occurrences constitute a rare phenomenon. We also identify some properties that may distinguish them from their idiomatic counterparts. This article is a largely extended version of Savary and Cordeiro (2018).

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Abeillé Anne and Yves Schabes. Parsing Idioms in Lexicalized TAGs. In Somers Harold L. and Mary McGee Wood editors Proceedings of the 4th Conference of the European Chapter of the ACL EACL’89 Manchester pages 1–9. The Association for Computer Linguistics 1989. URL http://dblp.uni-trier.de/db/conf/eacl/eacl1989.html#AbeilleS89.

  • Baldwin Timothy and Su Nam Kim. Multiword Expressions. In Indurkhya Nitin and Fred J. Damerau editors Handbook of Natural Language Processing pages 267–292. CRC Press Taylor and Francis Group Boca Raton FL USA 2 edition 2010. ISBN 978-1-4200-8592-1.

  • Bott Stefan Nana Khvtisavrishvili Max Kisselew and Sabine Schulte im Walde. Ghost-PV: A Representative Gold Standard of German Particle Verbs. In Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon Osaka Japan 2016.

  • Cacciari Cristina and Paola Corradini. Literal analysis and idiom retrieval in ambiguous idioms processing: A reading-time study. Journal of Cognitive Psychology 27(7):797–811 2015. doi: 10.1080/20445911.2015.1049178. URL http://dx.doi.org/10.1080/20445911.2015.1049178.

  • Constant Mathieu Gülşen Eryiğit Johanna Monti Lonneke van der Plas Carlos Ramisch Michael Rosner and Amalia Todirascu. Multiword Expression Processing: A Survey. Computational Linguistics to appear 2017.

  • Cook Paul Afsaneh Fazly and Suzanne Stevenson. The VNC-Tokens Dataset. In Proceedings of the Workshop on Multiword Expressions 2008.

  • Cordeiro Silvio Aline Villavicencio Marco Idiart and Carlos Ramisch. Unsupervised Compositionality Prediction of Nominal Compounds. Computational Linguistics 2019. doi: 10.1162/COLI_a_00341. (to appear).

  • Dryer Matthew S. and Martin Haspelmath editors. WALS Online. Max Planck Institute for Evolutionary Anthropology Leipzig 2013. URL https://wals.info/.

  • El Maarouf Ismail and Michael Oakes. Statistical Measures for Characterising MWEs. In IC1207 COST PARSEME 5th general meeting 2015. URL {http://typo.uni-konstanz.de/parseme/index.php/2-general/138-admitted-posters-iasi-23-24-september-2015}.

  • Fazly Afsaneh Paul Cook and Suzanne Stevenson. Unsupervised Type and Token Identification of Idiomatic Expressions. Computational Linguistics 35(1):61–103 2009. doi: 10.1162/coli.08-010-R1-07-048. URL https://doi.org/10.1162/coli.08-010-R1-07-048.

  • Geeraert Kristina R. Harald Baayen and John Newman. “Spilling the bag” on idiomatic variation. In Markantonatou Stella Carlos Ramisch Agata Savary and Veronika Vincze editors Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop pages 1–33. Language Science Press. Berlin 2018. doi: 10.5281/zenodo.1469551.

  • Grice Herbert Paul. Studies in the Way of Words. Harvard University Press Cambridge Mass. 1989.

  • Hashimoto Chikara and Daisuke Kawahara. Construction of an Idiom Corpus and its Application to Idiom Identification based on WSD Incorporating Idiom-Specific Features. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing pages 992–1001. Association for Computational Linguistics 2008. URL http://aclweb.org/anthology/D08-1104.

  • Inurrieta Uxoa Itziar Aduriz Ainara Estarrona Itziar Gonzalez-Dios Antton Gurrutxaga Ruben Urizar and Inaki Alegria. Verbal Multiword Expressions in Basque Corpora. In Proceedings of the Joint Workshop on Linguistic Annotation Multiword Expressions and Constructions (LAW-MWE-CxG-2018) pages 86–95 2018.

  • Katz Graham and Eugenie Giesbrecht. Automatic Identification of Non-Compositional Multi- Word Expressions using Latent Semantic Analysis. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties pages 12–19 Sydney Australia July 2006. URL http://www.aclweb.org/anthology/W/W06/W06-1203.

  • Köper Maximilian and Sabine Schulte im Walde. Distinguishing Literal and Non-Literal Usage of German Particle Verbs. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies pages 353–362 San Diego California 2016. URL http://www.aclweb.org/anthology/N16-1039.

  • Lichte Timm and Laura Kallmeyer. Same syntax different semantics: A compositional approach to idiomaticity in multi-word expressions. In Piñón Christopher editor Empirical Issues in Syntax and Semantics 11 pages 111–140 2016. URL http://www.cssp.cnrs.fr/eiss11/.

  • Markantonatou Stella Carlos Ramisch Agata Savary and Veronika Vincze. Preface. In Markantonatou Stella Carlos Ramisch Agata Savary and Veronika Vincze editors Multiword expressions at length and in depth: Extended papers from the MWE 2017 workshop pages 87–147. Language Science Press Berlin 2018. ISBN 978-3-96110-123-8. doi: 10.5281/zenodo.1469527.

  • Nivre Joakim Marie-Catherine de Marneffe Filip Ginter Yoav Goldberg Jan Hajic Christopher D. Manning Ryan McDonald Slav Petrov Sampo Pyysalo Natalia Silveira Reut Tsarfaty and Daniel Zeman. Universal Dependencies v1: A Multilingual Treebank Collection.

  • In Calzolari Nicoletta Khalid Choukri Thierry Declerck Sara Goggi Marko Grobelnik Bente Maegaard Joseph Mariani Helene Mazo Asuncion Moreno Jan Odijk and Stelios Piperidis editors Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016 pages 1659–1666. European Language Resources Association (ELRA) 2016. ISBN 978-2-9517408-9-1. 23-28 May 2016.

  • Patejuk Agnieszka and Adam Przepiórkowski. From Lexical Functional Grammar to Enhanced Universal Dependencies: Linguistically informed treebanks of Polish. Institute of Computer Science Polish Academy of Sciences Warsaw 2018. (263 pages).

  • Pausé Marie-Sophie. Structure lexico-sentaxique des locutions du français et incidence sur leur combinatoire. PhD thesis Université de Lorraine Nancy France 2017.

  • Peng Jing and Anna Feldman. Automatic Idiom Recognition with Word Embeddings. In SIMBig (Revised Selected Papers) volume 656 of Communications in Computer and Information Science pages 17–29. Springer 2016.

  • Peng Jing Anna Feldman and Ekaterina Vylomova. Classifying Idiomatic and Literal Expressions Using Topic Models and Intensity of Emotions. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) pages 2019–2027 Doha Qatar October 2014. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/D14-1216.

  • Popiel Stephen J. and Ken McRae. The figurative and literal senses of idioms or all idioms are not used equally. Journal of Psycholinguistic Research 17(6):475–487 Nov 1988. ISSN 1573-6555. doi: 10.1007/BF01067912. URL https://doi.org/10.1007/BF01067912.

  • Przepiórkowski Adam Jan Hajič Elżbieta Hajnicz and Zdeňka Urešová. Phraseology in two Slavic Valency Dictionaries: Limitations and Perspectives. International Journal of Lexicography 30(1):1–38 2017.

  • Ramisch Carlos Silvio Cordeiro Leonardo Zilio Marco Idiart Aline Villavicencio and Rodrigo Wilkens. How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) pages 156–161 Berlin Germany 2016. ACL. doi: 10.18653/v1/P16-2026. CORE2018 rank: A*. https://aclweb.org/anthology/P16-2026.

  • Ramisch Carlos Silvio Ricardo Cordeiro Agata Savary Veronika Vincze Verginica Barbu Mititelu Archna Bhatia Maja Buljan Marie Candito Polona Gantar Voula Giouli Tunga Güngör Abdelati Hawwari Uxoa Iñurrieta Jolanta Kovalevskaitė Simon Krek Timm Lichte Chaya Liebeskind Johanna Monti Carla Parra Escartín Behrang QasemiZadeh Renata Ramisch Nathan Schneider Ivelina Stoyanova Ashwini Vaidya and Abigail Walsh. Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the Joint Workshop on Linguistic Annotation Multiword Expressions and Constructions (LAW-MWE-CxG-2018) pages 222–240. Association for Computational Linguistics 2018. URL http://aclweb.org/anthology/W18-4925.

  • Recanati François. The alleged priority of literal interpretation. Cognitive Science 19:207–232 1995. URL https://jeannicod.ccsd.cnrs.fr/ijn_00000181.

  • Savary Agata and Silvio Cordeiro. Literal readings of multiword expressions: as scarce as hen’s teeth. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT 16) Jan 2018 Prague Czech Republic pages 64 – 72 Prague Czech Republic Jan. 2018.

  • Savary Agata Carlos Ramisch Silvio Cordeiro Federico Sangati Veronika Vincze Behrang QasemiZadeh Marie Candito Fabienne Cap Voula Giouli Ivelina Stoyanova and Antoine Doucet. The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the EACL’17 Workshop on Multiword Expressions 2017.

  • Savary Agata Marie Candito Verginica Barbu Mititelu Eduard Bejček Fabienne Cap Sla vomír Čéplö Silvio Ricardo Cordeiro Gülşen Eryiğit Voula Giouli Maarten van Gompel Yaakov HaCohen-Kerner Jolanta Kovalevskaitė Simon Krek Chaya Lie bes kind Johanna Monti Carla Parra Escartín Lonneke van der Plas Behrang QasemiZadeh Carlos Ramisch Fe derico Sangati Ivelina Stoyanova and Veronika Vincze. PARSEME multilingual corpus of verbal multiword expressions. In Markantonatou Stella Carlos Ramisch Agata Savary and Veronika Vincze editors Multiword expressions at length and in depth. Extended papers from the MWE 2017 workshop pages 87–147. Language Science Press Berlin 2018. ISBN 978-3-96110-123-8. doi: 10.5281/zenodo.1469527.

  • Sheinfux Livnat Herzig Tali Arad Greshler Nurit Melnik and Shuly Wintner. Verbal MWEs: Idiomaticity and flexibility. In Parmentier Yannick and Jakub Waszczuk editors Representation and Parsing of Multiword Expressions pages 5–38. Language Science Press Berlin 2019.

  • Tu Yuancheng and Dan Roth. Learning English Light Verb Constructions: Contextual or Statistical. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World MWE ’11 pages 31–39. Association for Computational Linguistics June 2011. URL http://www.aclweb.org/anthology/W11-0807.

  • Tu Yuancheng and Dan Roth. Sorting out the Most Confusing English Phrasal Verbs. In Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the Main Conference and the Shared Task and Volume 2: Proceedings of the 6th International Workshop on Semantic Evaluation SemEval ’12 pages 65–69. Association for Computational Linguistics 2012. URL http://dl.acm.org/citation.cfm?id=2387636.2387648.

  • Waszczuk Jakub Agata Savary and Yannick Parmentier. Promoting multiword expressions in A* TAG parsing. In COLING 2016 26th International Conference on Computational Linguistics Proceedings of the Conference: Technical Papers December 11-16 2016 Osaka Japan pages 429–439 2016. URL http://aclweb.org/anthology/C/C16/C16-1042.pdf.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 104 104 11
PDF Downloads 84 84 9