Detection of local text reuse is central to a variety of applications, including plagiarism detection, origin detection, and information flow analysis. This paper evaluates and compares effectiveness of fingerprint selection algorithms for the source retrieval stage of local text reuse detection. In total, six algorithms are compared – Every p-th, 0 mod p, Winnowing, Hailstorm, Frequency-biased Winnowing (FBW), as well as the proposed modified version of FBW (MFBW).
Most of the previously published studies in local text reuse detection are based on datasets having either artificially generated, long-sized, or unobfuscated text reuse. In this study, to evaluate performance of the algorithms, a new dataset has been built containing real text reuse cases from Bachelor and Master Theses (written in English in the field of computer science) where about half of the cases involve less than 1 % of document text while about two-thirds of the cases involve paraphrasing.
In the performed experiments, the overall best detection quality is reached by Winnowing, 0 mod p, and MFBW. The proposed MFBW algorithm is a considerable improvement over FBW and becomes one of the best performing algorithms.
Nowadays, in the insurance industry the use of predictive modeling by means of regression and classification techniques is becoming increasingly important and popular. The success of an insurance company largely depends on the ability to perform such tasks as credibility estimation, determination of insurance premiums, estimation of probability of claim, detecting insurance fraud, managing insurance risk. This paper discusses regression and classification modeling for such types of prediction problems using the method of Adaptive Basis Function Construction
A comparison of subset selection and adaptive basis function construction for polynomial regression model building
The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed - a potentially non-trivial (and long) trial and error process. In our previous research we considered an approach for polynomial regression model building which is different from the subset selection - letting the regression model building method itself construct the basis functions necessary for creating a model of arbitrary complexity without restricting oneself to the basis functions of a predefined full model. The approach is titled Adaptive Basis Function Construction (ABFC). In the present paper we compare the two approaches for polynomial regression model building - subset selection and ABFC - both theoretically and empirically in terms of their underlying principles, computational complexity, and predictive performance. Additionally in empirical evaluations the ABFC is compared also to two other well-known regression modelling methods - Locally Weighted Polynomials and Multivariate Adaptive Regression Splines.
An Analysis of Wi-Fi Based Indoor Positioning Accuracy
The increasing demand for location based services inside buildings has made indoor positioning a significant research topic. This study deals with indoor positioning using the Wireless Ethernet IEEE 802.11 (Wireless Fidelity, Wi-Fi) standard that has a distinct advantage of low cost over other indoor wireless technologies. The aim of this study is to examine several aspects of location fingerprinting based indoor positioning that affect positioning accuracy. Overall, the positioning accuracy achieved in the performed experiments is 2.0 to 2.5 meters.
Social networking sites such as Facebook, Twitter and VKontakte, online stores such as eBay, Amazon and Alibaba as well as many other websites allow users to share their thoughts with their peers. Often those thoughts contain not only factual information, but also users’ opinion and feelings. This subjective information may be extracted using sentiment analysis methods, which are currently a topic of active research. Most studies are carried out on the basis of texts written in English, while other languages are being less researched. The present survey focuses on research conducted on the sentiment analysis for the Latvian and Russian languages.