., Roland Kuhn, and Howard Johnson. Phrasetable smoothing for statistical machine translation. In EMNLP , pages 53-61, 2006. Gao, Qin and Stephan Vogel. Training phrase-based machine translation models on the cloudopen source machine translation toolkit chaski. Prague Bull. Math. Linguistics , 93: 37-46, 2010. Hardmeier, Christian. Fast and extensible phrase scoring for statistical machine translation. Prague Bull. Math. Linguistics , 93:87-96, 2010. Koehn, Philipp, Hieu Hoang, Alexandra
Mohammed Mediani, Jan Niehues and Alex Waibel
Fast and Extensible Phrase Scoring for Statistical Machine Translation
Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.
This study is a cross-sectional analysis of the relationship between productive fluency and the use of formulaic sequences in the speech of highly proficient L2 learners. Two samples of learner speech were randomly drawn and analysed. Formulaic sequences were identified on the basis of two distinct procedures: a frequency-based, distributional approach which returned a set of recurrent sequences (n-grams) and an intuition and criterion-based, linguistic procedure which returned a set of phrasemes. Formulaic material was then removed from the data. Breakdown and speed fluency measures were obtained for the following types of speech: baseline (pre-removal), formulaic, non-formulaic (post-removal). The results show significant differences between baseline and post-removal fluency scores for both learners. Also, formulaic speech is produced more fluently than non-formulaic speech. However, the comparison of the fluency scores of n-grams and phrasemes returned inconsistent results with significant differences reported only for one of the samples.
Arab EFL Learners' Acquisition of Modals
This paper investigates Arab EFL learners' acquisition of modal verbs. The study used a questionnaire, which comprises two versions, testing students' mastery of modals at the levels of both recognition and production. The questionnaire was distributed to 50 English major university students who had studied English for 12-14 years and who had scored 500 or more on the TOEFL. The findings of the study show that the overall performance of the subjects in the study was quite low. The study established a hierarchy of difficulty and identified the major causes of difficulty in the use of modals.
This paper presents a set of simple statistical measures that illustrate the difference between native English speakers and Polish learners of English in varying the length of vocalic segments in read speech. Relative vowel duration and vowel length variation are widely used as basic criteria for establishing rhythmic differences between languages and dialects of a language. The parameter of vocalic duration is employed in popular measures such as ΔV (Ramus et al. 1999), VarcoV (Dellwo 2006, White and Mattys 2007), and PVI (Low et al. 2000, Grabe and Low 2002). Apart from rhythm studies, the processing of data concerning vowel duration can be used to establish the level of discrepancy between native speech and learner speech in investigating other temporal aspects of FL pronunciation, such as tense-lax vowel distinction, accentual lengthening or the degree of unstressed vowel reduction, which are often pointed out as serious problems in the acquisition of English pronunciation by Polish learners. Using descriptive statistics (relations between personal mean vowel duration and standard deviation), the author calculates several indices that demonstrate individual learners' (13 subjects) scores in relation to the native speakers' (12 subjects) score ranges. In some tested aspects, the results of the two groups of speakers are almost cleanly separated, which suggests not only the existence of specific didactic problems but also their actual scale.
Šárka Šimáčková and Václav Jonáš Podlipský
Recent studies of short-term phonetic interference suggest that code-switching can lead to momentary increases in L1 influence on L2. In an earlier study using a single acoustic measure (VOT), we found that Czech EFL learners’ pronunciation of English voiceless stops had shorter, i.e. more L1-Czech-like, VOTs in code-switched compared to L2-only sentences. The first aim of the current study was to test the prediction that native listeners would judge the code-switched English productions as more foreign-accented than the L2-only productions. The results provide only weak support for this prediction. The second aim was to test whether more native-like VOT values would correlate with improved accentedness scores. This was confirmed for sentence-initial stops.
Tatiana Litvinova, Pavel Seredin, Olga Litvinova and Olga Zagorovskaya
Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self-destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self-destructive behaviour has been obtained.
Pilar Avello, Joan Carles Mora and Carmen Pérez-Vidal
The present study aims at exploring the under-investigated interface between SA and L2 phonological development by assessing the impact of a 3-month SA programme on the pronunciation of a group of 23 Catalan/Spanish learners of English (NNSs) by means of phonetic measures and perceived FA measures. 6 native speakers (NS) in an exchange programme in Spain provided baseline data for comparison purposes. The participants were recorded performing a reading aloud task before (pre-test) and immediately after (post-test) the SA. Another group of 37 proficient non-native listeners, also bilingual in Catalan/Spanish and trained in English phonetics, assessed the NNS' speech samples for degree of FA. Phonetic measures consisted of pronunciation accuracy scores computed by counting pronunciation errors (phonemic deletions, insertions and substitutions, and stress misplacement). Measures of perceived FA were obtained with two experiments. In experiment 1, the listeners heard a random presentation of the sentences produced by the NSs and by the NNSs at pre-test and post-test and rated them on a 7-point Likert scale for degree of FA (1 = “native” , 7 = “heavy foreign accent”). In experiment 2, they heard paired pre-test/post-test sentences (i.e. produced by the same NNS at pre-test and posttest) and indicated which of the two sounded more native-like. Then, they stated their judgment confidence level on a 7-point scale (1 = “unsure”, 7 = “sure”). Results indicated a slight, non-significant improvement in perceived FA after SA. However, a significant decrease was found in pronunciation accuracy scores after SA. Measures of pronunciation accuracy and FA ratings were also found to be strongly correlated. These findings are discussed in light of the often reported mixed results as regards pronunciation improvement during short-term immersion.
Arkadiusz Rojczyk, Geoffrey Schwartz and Anna Balas
The study investigates the perception of devoicing of English /w, r, j, l/ after /p, t, k/ as a word-boundary cue by Polish listeners. Polish does not devoice sonorants following voiceless stops in word-initial positions. As a result, Polish learners are not made sensitive to sonorant devoicing as a segmentation cue. Higher-proficiency and lower-proficiency Polish learners of English participated in the task in which they recognised phrases such as buy train vs. bite rain or pie plot vs. pipe lot. The analysis of accuracy scores revealed that successful segmentation was only above chance level, indicating that sonorant voicing/devoicing cue was largely unattended to in identifying the boundary location. Moreover, higher proficiency did not lead to more successful segmentation. The analysis of reaction times showed an unclear pattern in which higher-proficiency listeners segmented the test phrases faster but not more accurately than lower-proficiency listeners. Finally, #CS sequences were recognised more accurately than C#S sequences, which was taken to suggest that the listeners may have had some limited knowledge that devoiced sonorants appear only in word-initial positions, but they treated voiced sonorants as equal candidates for word-final and word-initial positions.
The paper investigates the dynamics of speech rhythm in Polish learners of English and, specifically, how rhythm measurements revealing durational characteristics of vocalic and consonantal intervals through the measures (%V, ΔV, ΔC, VarcoV, VarcoC and nPVI) change along the process of second language acquisition as a result of language experience and phonetic training, and influence rhythmic characteristics of L2 English. The data used for the analysis come from 30 Polish first-year students of the University of Łódź recorded reading two texts (English and Polish) during two recording sessions separated by a 7-month period of language studies and compared to the data obtained from the recordings of native speakers of English. The experiment aims at verifying whether the participants achieve progress in the rhythm measure scores under the influence of language experience and phonetic training, as it has already been confirmed that general proficiency of non-native speakers of English is a key factor contributing to the successful production of rhythmic patterns in English (Waniek-Klimczak 2009, Roach 2002). The results have shown no substantial and consistent progress for the whole group and across all the measures. Statistical tests, however, have revealed significant changes in the subjects' performance with respect to the vocalic measures ΔV and VarcoV. This may reflect the effect of the type of phonetic training the students are offered, which is segment-based with particular emphasis on vowels.