Search Results

You are looking at 1 - 8 of 8 items for :

  • Information Management x
Clear All
Open access

Neil R. Smalheiser and Aaron M. Cohen

-dimensional space. The distance between any two PubMed articles can be calculated as a weighted sum of the pairwise similarity scores of the underlying features between each PubMed article. Then, the overall distance between a PubMed article and a training set will be some function of the weighted pairwise similarity scores (for each of the articles that make up the training set). Finally, articles can be classified as belonging to one or more categories (depending on the relative distance of an article to the positive vs. negative training sets) or similar articles can be

Open access

Christina Lioma, Birger Larsen and Peter Ingwersen

between quotes. Each of these 52 queries was assessed by 101 users. The scores in brackets in Table 1 show the average user agreement on the most popular user choice for each query, which we computed as the % of users (out of all 101 users) who agree on the most popular term dependence option for each query. For instance, the average agreement of 69% for “rain man” means that 70 out of 101 users (≈69%) selected the option “rain man”. The 52 train queries are sorted in Table 1 by decreasing user agreement. Table 1 Train queries used on the CrowdFlower

Open access

Toine Bogers and Vivien Petras

bibliographical metadata change the search performance? Answer : There is no significant difference when combining Core bibliographical metadata with CVs. Including Core bibliographical metadata in general achieves a better performance. Any real-world book search engine would always include the core bibliographic data in its documents. The NDCG@10 scores seem to bene t from adding the Core elements to other metadata elements. These differences are significant according to a two-tailed paired t -test ( t (1307) = 4.799, p < .0005, ES = 0.13, 95% CI [0.0083, 0

Open access

Xiao Hu

control of the mouse: “I don’t know what’s happening, why it spins so fast, I just use my mouse to drag left and right because I want to check out the paintings around, why it is so hard to control!” (participant 4). Other participants also complained about the automatic spinning feature of the panoramic function: “I feel quite dizzy that it spins all the time!” (participant 1) “Why can’t I stop if from spinning?” (participant 8) 4.3.2 Organization of information Another criterion with lower scores was “organization of information” where

Open access

Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu, Wei Lu and Long Lu

characteristics. Christy et al. (2015) proposed two cluster-based outlier detection algorithms including distance-based outlier detection and cluster-based outlier detection. The main purpose of the algorithms was to remove outliers that are irrelevant or only weakly relevant to the analysis of health care data. Experimental evaluation based on the metrics of F-score and likelihood ratio shows that the cluster-based outlier detection method outperforms distance-based outlier detection method. Huang and Yao (2016) proposed a novel clustering approach for multidimensional

Open access

Eric Zheng, Yong Tan, Paulo Goes, Ramnath Chellappa, D.J. Wu, Michael Shaw, Olivia Sheng and Alok Gupta

mining, as well as a variety of econometric models to discover valuable information, which we have been doing during the past few decades. Take my research with a commercial bank as an example. What we found from the decision tree generated to explain the commercial loaning process for medium-sized companies is that the loan is based on the financial attribute and the risk level of the applicant. The numbers one to five represent the risk level perceived for the small- and medium-sized businesses, with the score one meaning the most secure and five meaning the riskiest

Open access

Weijia Xu, Amit Gupta, Pankaj Jaiswal, Crispin Taylor, Patti Lockhart and Jennifer Regala

entities of importance to authors. Since the ABNER program does not offer a way to sort entities, we used total number of entities found by ABNER, which resulted in a very low precision score. This also reinforces our motivations that existing entities tools cannot be used for solving this problem directly. Additional features and functionalities must be developed to be used in practice. Table 2 Results of Entity Recognition Against Author Curation as Ground Truth Total number of Entities Total Entities in Ground Truth Total Recall Total Precision

Open access

Danchen Zhang and Daqing He

We assume that a disease is highly probable to be correct if it is predicted as true diagnosis by both SemMedDB and Wikipedia.From the results of the experiments in the next section, we found that Wikipedia has a better and more robust prediction across three datasets, and hence, we use the following rules to combine the prediction outputs from Wiki-DP and SMDB-DP: – Only top 10 diseases are considered in both ranking lists. – If the two lists share the same diseases, the shared diseases are kept and ranked with Wikipedia ranking score. – If the