Browse

You are looking at 1 - 10 of 2,590 items for :

  • Linguistics and Semiotics x
Clear All
Open access

Berta González Saavedra and Marco Passarotti

Abstract

In the context of the Index Thomisticus Treebank project, we have enhanced the full text of Bellum Catilinae by Sallust with semantic annotation. The annotation style resembles the one used for the so called “tectogrammatical” layer of the Prague Dependency Treebank. By exploiting the results of semantic role labeling, ellipsis resolution and coreference analysis, this paper presents a network-based study of the main Actors and Actions (and their relations) in Bellum Catilinae.

Open access

Tim vor der Brück

Abstract

Rule-based natural language generation denotes the process of converting a semantic input structure into a surface representation by means of a grammar. In the following, we assume that this grammar is handcrafted and not automatically created for instance by a deep neural network. Such a grammar might comprise of a large set of rules. A single error in these rules can already have a large impact on the quality of the generated sentences, potentially causing even a complete failure of the entire generation process. Searching for errors in these rules can be quite tedious and time-consuming due to potentially complex and recursive dependencies. This work proposes a statistical approach to recognizing errors and providing suggestions for correcting certain kinds of errors by cross-checking the grammar with the semantic input structure. The basic assumption is the correctness of the latter, which is usually a valid hypothesis due to the fact that these input structures are often automatically created.

Our evaluation reveals that in many cases an automatic error detection and correction is indeed possible.

Open access

Lauriane Aufrant and Guillaume Wisniewski

Abstract

We present PanParser, a Python framework dedicated to transition-based structured prediction, and notably suitable for dependency parsing. On top of providing an easy way to train state-of-the-art parsers, as empirically validated on UD 2.0, PanParser is especially useful for research purposes: its modular architecture enables to implement most state-of-the-art transition-based methods under the same unified framework (out of which several are already built-in), which facilitates fair benchmarking and allows for an exhaustive exploration of slight variants of those methods. PanParser additionally includes a number of fine-grained evaluation utilities, which have already been successfully leveraged in several past studies, to perform extensive error analysis of monolingual as well as cross-lingual parsing.

Open access

Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc-Quan Pham, Sebastian Stüker and Alex Waibel

Abstract

In this paper we introduce an open source toolkit for speech translation. While there already exists a wide variety of open source tools for the essential tasks of a speech translation system, our goal is to provide an easy to use recipe for the complete pipeline of translating speech. We provide a Docker container with a ready to use pipeline of the following components: a neural speech recognition system, a sentence segmentation system and an attention-based translation system. We provide recipes for training and evaluating models for the task of translating English lectures and TED talks to German. Additionally, we provide pre-trained models for this task. With this toolkit we hope to facilitate the development of speech translation systems and to encourage researchers to improve the overall performance of speech translation systems.

Open access

Álvaro Peris and Francisco Casacuberta

Abstract

We present NMT-Keras, a flexible toolkit for training deep learning models, which puts a particular emphasis on the development of advanced applications of neural machine translation systems, such as interactive-predictive translation protocols and long-term adaptation of the translation system via continuous learning. NMT-Keras is based on an extended version of the popular Keras library, and it runs on Theano and TensorFlow. State-of-the-art neural machine translation models are deployed and used following the high-level framework provided by Keras. Given its high modularity and flexibility, it also has been extended to tackle different problems, such as image and video captioning, sentence classification and visual question answering.

Open access

Václava Kettnerová, Markéta Lopatková, Eduard Bejček and Petra Barančíková

Abstract

This paper summarizes results of a theoretical analysis of syntactic behavior of Czech light verb constructions and their verification in the linguistic annotation of a large amount of these constructions. The concept of LVCs is based on the observation that nouns denoting actions, states, or properties have a strong tendency to select semantically underspecified verbs, which leads to a specific rearrangement of valency complementations of both nouns and verbs in the syntactic structure. On the basis of the description of deep and surface syntactic properties of LVCs, a formal model of their lexicographic representation is proposed here. In addition, the resulting data annotation, capturing almost 1,500 LVCs, is described in detail. This annotation has been integrated in a new version of the VALLEX lexicon, release 3.5.

Open access

Jetic Gū, Anahita Mansouri Bigvand and Anoop Sarkar

Abstract

In this paper, we present a new word aligner with built-in support for alignment types, as well as comparisons between various models and existing aligner systems. It is an open source software that can be easily extended to use models of users’ own design. We expect it to suffice the academics as well as scientists working in the industry to do word alignment, as well as experimenting on their own new models. Here in the present paper, the basic designs and structures will be introduced. Examples and demos of the system are also provided.

Open access

Milan Smutný

Abstract

This paper deals with terminology as a characteristic feature of the language used in science and technology. The lexical units in question serve the communication needs and demands of particular discourse communities, i.e., experts in different branches of science and specializations. Terminology precisely describes reality, carries specific information on the phenomena and relationships between them and helps to avoid shifts in meaning during the process of communication. In comparison with other spheres of life where shifts in meaning are common, in science and technology, changes in the information transferred are unacceptable and may lead to serious consequences. This paper focuses on various aspects and approaches to this part of the lexical system. Examples from the English language for Electrical Engineering and Communication Technologies provide an insight into different criteria for classifying units as terms, lexical patterns and semantic relationships between the individual constituents. Other features, qualities and functions of terminology, such as the stabilizing reality, interconnection between explicitness and implicitness or description of progress reflecting a unique attitude to reality are also discussed.

Open access

Lahoucine Aammari

Abstract

Arthur Leared’s Morocco and the Moors (1876) and Budgett Meakin’s Life in Morocco and Glimpses Beyond (1905) are two less-examined imperial travel texts on precolonial Morocco. These two travelogues are British (Irish and English, respectively) – a fact that casts on them from the beginning the special taste of this genre which is a British specialty par excellence. Coming from the same political and cultural backdrops, Leared and Meakin peregrinated into Morocco in a precolonial time when it was still perceived as the “Lands of the Moors”. These two travellers responded to moments of interactions with the Moors as a culturally, socially and religiously different other. Both these Victorian travellers were aware of the fact of empire as their travelogues function as fodder to energize the discursive grandiloquence of empire. They stress an ethnocentric view in depicting Moroccans and their culture, and they communicate their observations through an interpretative framework, or in Foucauldian terminology, through the “discourses” provided by their culture. This paper undertakes the examination of these two travellers’ perception of otherness; the approach is to question and bring to the fore the rhetorical and discursive strategies as well as modes of representation Leared and Meakin deploy in their encounters with the Moors in Pre-Protectorate Morocco.

Open access

Simona Klimková

Abstract

The implications of the colonialist discourse, which suggested that the colonized is a person “whose historical, physical, and metaphysical geography begins with European memory” (Thiong’o, 2009), urged postcolonial writers to correct these views by addressing the issues from their own perspectives. The themes of history and communal/national past thus play a prominent role in postcolonial literature as they are inevitably interwoven with the concept of communal identity. In Petals of Blood (1977), the Kenyan writer Ngũgĩ wa Thiong’o explores the implications of social change as brought about by the political and economic development during the post-independence period. This paper seeks to examine the crucial relation between personal and communal/national history and relate it to the writer’s views of principal legacies of colonialism. As Thiong’o states: “My interest in the past is because of the present and there is no way to discuss the future or present separate from the past” (Thiong’o, 1975). Clearly, the grasping of the past and one’s identification with it seems fundamental in discussing national development. As Ngũgĩ wa Thiong’o’s narratives are always situated in the realm of political and historical context, blending fiction with fact, this paper also aims to elaborate on the implications of his vision.