important efforts have been made to streamline text mining workflows by providing a library of natural language processing (NLP) tools (e.g., stemmers, parsers, and named entity recognizers) that can be connected together in a pipeline Manning, Surdeanu, Bauer, Finkel, Bethard, McClosky, D., 2014 ; Savova, Masanz, Ogren, Zheng, Sohn, Kipper-Schuler, Chute, 2010 ; Batista-Navarro, Carter, Ananiadou, 2016 ; Clarke, Srikumar, Sammons, Roth, 2012 ). In addition, there are valuable machine learning packages that provide machine learning algorithms in a user-friendly manner
that have been included in at least 3,000 publications. The keywords are sorted in descending order. “Embryonic stem-cells” has 2.52 YK and “innate human immunity” has 1.57 YK.
Best 30 performers in terms of YK.
embryonic stem-cells; carbon nanotubes; field-effect transistors; graphite; genome-wide association; caenorhabditis-elegans; DNA methylation; living cells; regulatory t-cells; gold nanoparticles; tgf-beta; one-pot synthesis; quantum dots; functionalization; electrodes; acute myeloid-leukemia; long-term potentiation; activated
Minghong Chen, Jingye Qu, Yuan Xu and Jiangping Chen
, information extraction, and summarization requires understanding of the meaning of the texts, and has been challenging.
This study applies three types of text analysis/processing: (1) low-level natural language processing such as stop-word identification and filtering and stemming. The result helps to create a high quality word cloud that reveals the most frequent content words from the abstracts of the projects; (2) descriptive or bibliometric analysis. This is possible because the records of these NSF projects are well-organized datasets, as described in Table 1
://dumps.wikimedia.org/enwiki/20160701/ was used for Wiki-DP. It was downloaded on March 5, 2016, and contains 5.79 million articles. Only the title and the content of each article were kept. Tags, references, external links and see also parts were all removed. The Wikipedia collection was first performed to stop word removal and stemming using Porter stemmer, and it was then indexed by Indri.
Summary of experiment topics and collections.
2014 CDS Track
Ahmed AlKalbani, Hepu Deng, Booi Kam and Xiaojuan Zhang
evaluated in a specific situation.
There are three types of external pressures that an organization has to consider including coercive pressures, normative pressures, and mimetic pressures ( Davidsson et al., 2006 ; Cavusoglu et al., 2015 ). Coercive pressures force organizations to adopt certain institutionalized regulations and practices with respect to the security of organizational information in managing the organization ( Hu et al., 2007 ). Such pressures stems from government laws and regulations that force organizations to act in compliance to certain rules and
determine optimal parameter settings on our training topics. These optimal settings were then used on the 334 test topics to produce the results presented in the remainder of this paper.
We optimized three different parameters:
Degree of smoothing . The λ parameter controls the influence of the collection language model, with higher values giving more influence to the collection language model. We varied λ in steps of 0.1, from 0.0 to 1.0.
Stopword filtering . Either no filtering or using the SMART stop word list.
Stemming . Either no
, length, weight, and multiple measures of the same tree… whether the branch, or the wood, or just the stem of the tree. Those are some of the attributes that we can query by and use to tie together studies from different sources.”
Often, standardized attributes help access and manipulate data and thus make processing, joining, and extracting subsets of data easier.
Burgeoning Data Repositories and Archival Systems
Speaking of the emergence and abundance of different data repositories and archival systems, the scientists, on the one hand, applaude the
( Gravetter & Wallnau, 2013 ). Statistics is one of the most useful and powerful tools in data analysis for both academics and practitioners ( Vaughan, 2001 ). The essence of statistics lies in the idea of inference ( Johnson, 2009 ). As researchers collect sample data to answer specific quantitative research questions, statistical methods enable them to draw conclusions about a broader base of people, events, or objects compared with samples actually included in the study ( Munro, 2005 ). Generalization and interpretation of the statistical results stemming from limited
Zhao, D., & Strotmann, A. (2011). Counting first, last, or all authors in citation analysis: A comprehensive comparison in the highly collaborative stem cell research field. Journal of the American Society for Information Science and Technology , 62 (4), 654–676. https://doi.org/10.1002/asi.21495 10.1002/asi.21495 Zhao D. Strotmann A. 2011 Counting first, last, or all authors in citation analysis: A comprehensive comparison in the highly collaborative stem cell research field Journal of the American Society for
Yiming Zhao, Baitong Chen, Jin Zhang, Ying Ding, Jin Mao and Lihong Zhou
contains prepositions, conjunctions, auxiliaries, articles, numerals, interjections, and other function words. The Porter stemming algorithm was used to remove common morphological and inflectional endings from words and to bring variant forms of a word together ( Porter, 1980 ).
The log data in Yahoo! Answers from April 1, 2013 to March 31, 2014 were collected, comprising 8,570 Q&A records, wherein every record contains a question and its corresponding best answer. The total number of words in the log data was 1,486,696, and the average number of words per record was