Kateřina Rysová, Magdaléna Rysová, Michal Novák, Jiří Mírovský and Eva Hajičová
. European Languages Resources Association (ELRA).
Burstein, Jill, Karen Kukich, Susanne Wolff, Chi Lu, and Martin Chodorow. Computer analysis of essays. 1998.
Castro-Castro, Daniel, Rocío Lannes-Losada, Montse Maritxalar, Ianire Niebla, Celia Pérez-Marqués, Nancy C. Álamo-Suárez, and Aurora Pons-Porrata. A Multilingual Application for Automated Essay Scoring. In Advances in Artificial Intelligence – IBERAMIA 2008 , pages 243–251, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
Foltz, Peter W., Darrell Laham, and Thomas K. Landauer. The Intelligent
., Roland Kuhn, and Howard Johnson. Phrasetable smoothing for statistical machine translation. In EMNLP , pages 53-61, 2006.
Gao, Qin and Stephan Vogel. Training phrase-based machine translation models on the cloudopen source machine translation toolkit chaski. Prague Bull. Math. Linguistics , 93: 37-46, 2010.
Hardmeier, Christian. Fast and extensible phrase scoring for statistical machine translation. Prague Bull. Math. Linguistics , 93:87-96, 2010.
Koehn, Philipp, Hieu Hoang, Alexandra
Anita Dobek, Krzysztof Moliński and Ewa Skotarczak
Chandra T.K., Joshi S.N. (1983): Comparison of likelihood ratio, Rao’s and Wald’s tests and a conjecture of C.R. Rao. Sankhya A, 45: 226-246.
Fox J. (1997): Applied regression analysis, linear models, and related methods. Thousand Oaks, CA, US: Sage Publications, Inc.
Li B. (2001): Sensitivity of Rao’s score test, the Wald test and the likelihood ratio test to nuisance parameters. J. Statistical Planning and Inference 97: 57-66.
Madansky A. (1989): A comparison of the Likelihood Ratio
Jacek Błażewicz, Piotr Formanowicz and Paweł Wojciechowski
Some remarks on evaluating the quality of the multiple sequence alignment based on the BAliBASE benchmark
BAliBASE is one of the most widely used benchmarks for multiple sequence alignment programs. The accuracy of alignment methods is measured by bali_score—an application provided together with the database. The standard accuracy measures are the Sum of Pairs (SP) and the Total Column (TC). We have found that, for non-core block columns, results calculated by bali_score are different from those obtained on the basis of the formal definitions of the measures. We do not claim that one of these measures is better than the other, but they are definitely different. Such a situation can be the source of confusion when alignments obtained using various methods are compared. Therefore, we propose a new nomenclature for the measures of the quality of multiple sequence alignments to distinguish which one was actually calculated. Moreover, we have found that the occurrence of a gap in some column in the first sequence of the reference alignment causes column discarding.
The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n3/2)).
Many handwritten signature verification algorithms have been developed in order to distinguish between genuine signatures and forgeries. An important group of these methods is based on dynamic time warping (DTW). Traditional use of DTW for signature verification consists in forming a misalignment score between the verified signature and a set of template signatures. The right selection of template signatures has a big impact on that verification. In this article, we describe our proposition for replacing the template signatures with the hidden signature-an artificial signature which is created by minimizing the mean misalignment between itself and the signatures from the enrollment set. We present a few hidden signature estimation methods together with their comprehensive comparison. The hidden signature opens a number of new possibilities for signature analysis. We apply statistical properties of the hidden signature to normalize the error signal of the verified signature and to use the misalignment on the normalized errors as a verification basis. A result, we achieve satisfying error rates that allow creating an on-line system, ready for operating in a real-world environment
A method for learning scenario determination and modification in intelligent tutoring systems
Computers have been employed in education for years. They help to provide educational aids using multimedia forms such as films, pictures, interactive tasks in the learning process, automated testing, etc. In this paper, a concept of an intelligent e-learning system will be proposed. The main purpose of this system is to teach effectively by providing an optimal learning path in each step of the educational process. The determination of a suitable learning path depends on the student's preferences, learning styles, personal features, interests and knowledge state. Therefore, the system has to collect information about the student, which is done during the registration process. A user is classified into a group of students who are similar to him/her. Using information about final successful scenarios of students who belong to the same class as the new student, the system determines an opening learning scenario. The opening learning scenario is the first learning scenario proposed to a student after registering in an intelligent e-learning system. After each lesson, the system tries to evaluate the student's knowledge. If the student has a problem with achieving an assumed score in a test, this means that the opening learning scenario is not adequate for this user. In our concept, for this case an intelligent e-learning system offers a modification of the opening learning scenario using data gathered during the functioning of the system and based on a Bayesian network. In this paper, an algorithm of scenario determination (named ADOLS) and a procedure for modifying the learning scenario AMLS with auxiliary definitions are presented. Preliminary results of an experiment conducted in a prototype of the described system are also described.
Modern cancer diagnostics is based heavily on cytological examinations. Unfortunately, visual inspection of cytological preparations under the microscope is a tedious and time-consuming process. Moreover, intra- and inter-observer variations in cytological diagnosis are substantial. Cytological diagnostics can be facilitated and objectified by using automatic image analysis and machine learning methods. Computerized systems usually preprocess cytological images, segment and detect nuclei, extract and select features, and finally classify the sample. In spite of the fact that a lot of different computerized methods and systems have already been proposed for cytology, they are still not routinely used because there is a need for improvement in their accuracy. This contribution focuses on computerized breast cancer classification. The task at hand is to classify cellular samples coming from fine-needle biopsy as either benign or malignant. For this purpose, we compare 5 methods of nuclei segmentation and detection, 4 methods of feature selection and 4 methods of classification. Nuclei detection and segmentation methods are compared with respect to recall and the F1 score based on the Jaccard index. Feature selection and classification methods are compared with respect to classification accuracy. Nevertheless, the main contribution of our study is to determine which features of nuclei indicate reliably the type of cancer. We also check whether the quality of nuclei segmentation/detection significantly affects the accuracy of cancer classification. It is verified using the test set that the average accuracy of cancer classification is around 76%. Spearman’s correlation and chi-square test allow us to determine significantly better features than the feature forward selection method.
Fast and Extensible Phrase Scoring for Statistical Machine Translation
Existing tools for generating phrase tables for phrase-based Statistical Machine Translation (SMT) are generally optimised towards low memory use to allow processing of large corpora with limited memory. Whilst being a reasonable design choice, this approach does not make optimal use of resources when the sufficient memory is available. We present memscore, a new open-source tool to score phrases in memory. Besides acting as a faster drop-in replacement for existing software, it implements a number of standard smoothing techniques and provides a platform for easy experimentation with new scoring methods.
Recognition, ICPR 2008, Tampa, FL, USA , pp. 1–4.
Juang, B. (1985). On the hidden Markov model and dynamic time warping for speech recognition—A unified view, AT&T Bell Laboratories Technical Journal 63 (7): 1213–1242.
Just, W. and Just, W. (1999). Computational complexity of multiple sequence alignment with SP-score, Journal of Computational Biology 8 (6): 615–623.
Kaiser, R. and Knight, W. (1979). Digital signal averaging, Journal of Magnetic Resonance (1969) 36 (2): 215–220.
Keogh, E.J., Xi, X., Wei, L. and Ratanamahatana, C. (2006). The