Search Results

You are looking at 1 - 10 of 54 items for :

  • Databases and Data Mining x
Clear All
Open access

Jie Wang, Chengzhi Zhang, Mengying Zhang and Sanhong Deng

article and paragraph. Since the final survey is based on user’s search terms in CitationAS, we choose 20 high-frequency phrases from the dataset as search terms and used them for experiments. Phrases are shown in Table 2 . Here, the frequency refers to the number of phrases presented in the citation content dataset. We divided them into ten high frequency 2-gram and 3-gram separately. When retrieved, our system gets relevant citation sentences based on inverted index and search terms. Then the results are ranked based on TF-IDF scoring mechanism. The higher score

Open access

Chaocheng He, Panhao Ma, Lusha Zhou and Jiang Wu

tests and multiple linear regressions that are used to find the relation between forum participation activities and course scores. In section 4 , from the perspective of the supernetwork, super-node degree and super-edge degree are defined and employed with real course data to do forum analyses. Section 5 draws conclusions and introduces visions for future researches emerging from this work. 2 Related Work Along with the two MOOC networks, the people network and knowledge network, there are two basic MOOC curriculum modes used by students and teachers

Open access

Xiaoling Liu, Mihai Păunescu, Viorel Proteasa and Jinshan Wu

, or some other kind of average score, be used to represent the set, and under which conditions a comparison based on such measures of central tendency can be reliable? When does the comparison of arithmetic means indicate a grouping of academics, such as a department or a university, performs better than the one it is compared to? In order to answer this question, we proposed a definition of the minimum representative size κ as a parameter which characterizes a pair of data sets which are to be compared. The method is described in details, including the analytical

Open access

Qiuzi Zhang, Qikai Cheng, Yong Huang and Wei Lu

result lists, it can be identified as a DUS. 3.3 Procedures 3.3.1 Bootstrapping Process In each iteration, a number of data_clue words and data_patterns will be, respectively, added to their corresponding final list if their own score has exceeded the current threshold. The score can be interpreted as the relative probability of a data_clue word or a data_pattern being regarded as valid, based on currently available evidence. As illustrated in Figure 1 , the bootstrapping process is triggered by adding the original seed words to the seed pool, and the

Open access

Jon Garner, Alan L. Porter, Andreas Leidolf and Michelle Baker

gauge interdisciplinary research knowledge interchange ( Zhang, Rousseau, & Gläzel, 2016 ), including “Integration scores” ( Porter et al., 2007 ; 2008 ), Rao-Stirling diversity ( Rafols & Meyer, 2010 ; Stirling, 2007 ), and Diffusion scores ( Carley & Porter, 2012 ; Garner, Porter, & Newman, 2014 ) Cross-research domain knowledge interchange (Kwon et al., under review; Porter et al., 2013 ) Science overlay maps to visually represent the diversity of publication, citation, or citing sub-disciplinary involvement (Carley et al., under review; Leydesdorff et al

Open access

Loet Leydesdorff, Wouter de Nooy and Lutz Bornmann

1 Introduction Ramanujacharyulu (1964) provided a graph-theoretical algorithm to select the winner of a tournament on the basis of the total scores of all the matches, whereby both gains and losses are taken into consideration. Prathap & Nishy (under review) proposed to use this power-weakness ratio (PWR) for citation analysis and journal ranking. PWR has been proposed for measuring journal impact with the arguments that it handles the rows and columns in the asymmetrical citation matrix symmetrically, its recursive algorithm (which it shares with other

Open access

Raf Guns

linkpred software ( Guns, 2014 ). We use the following notation. Each predictor determines a likelihood score s ( u , v ) that specifies the likelihood of a link occurring between nodes u and v in the test network. The set of neighbors that a node is connected to is called its neighborhood . The neighborhood of v is denoted by Γ( v ). We use | · | to denote set cardinality; hence, |Γ( v )| is the degree of v , the number of adjacent nodes. Finally, w ( x , y ) denotes the weight of the link between x and v . 3.1 Unweighted Predictors The first

Open access

Yuqing Mao and Zhiyong Lu

). The citation information provides an understanding of the interaction among various scientific disciplines. Therefore two journals are likely to be related if articles published in these two journals often cite each another. For example, in the study carried out by Pudovkin and Garfield (2002) the related journal list was produced using the “relatedness factor (RF)” based on citation data in JCR. RF was calculated with the citation scores for journals that give to or receive from one journal (in their paper is Genetics , a core journal in the field of genetics

Open access

Esteban Fernández Tuesta, Carlos Garcia-Zorita, Rosario Romera Ayllon and Elías Sanz-Casado

) propose an alternative approach to the QS score they called Composite I-distance Indicator (CIDI), which could be applicable to other global rankings. In their statistical analysis of known rankings, Bornmann, Mutz, and Daniel (2013) explored one of the main bibliometric indicators used by the Leiden ranking, namely the proportion of papers published by a university that lies within the 10% most cited (PP top10% ). These same authors pointed out that a more sophisticated statistical model than deployed by the editors of the Leiden ranking http

Open access

Xiang Zhou, Pengyi Zhang and Jun Wang

comparisons of (1) similarities of two search queries, (2) URLs that the Web search engine returns ( Glance, 2000 ), and (3) documents that the Web search engine returns ( Raghavan & Sever, 1995 ). Similarity scores are calculated based on these three indexes to decide whether two queries belong to the same search task. The two major methods used herein for comparing the relevance of these two search queries are (1) identifying word similarities in the queries and extracting the sets of the search terms from these two queries. Some useful indexes for this task include the