Browse

You are looking at 1 - 10 of 32 items for :

  • Library and Information Science, other x
Clear All
Open access

Tolga Yuret

Abstract

A prevalent belief is that it is advantageous to have surname initials that are placed early in the alphabet (early surname initials) in academic fields in which authors are ordered alphabetically (alphabetic academic fields), because first authors are more visible. However, it is not certain that the advantage is strong enough to affect academic careers. In this paper, the advantage in having such early surname initials is analyzed by using data from 1,345 course catalogs that span a 100 years. We obtained academic titles and surname initials of 19,353 faculty members who appeared 211,816 times in these course catalogs. Two alphabetic academic fields – economics and mathematics – and four other academic fields that are not alphabetic were analyzed. We found that there are some years when faculty members who have early surname initials are more likely to be full professors. However, there are many other years when faculty members who have early surname initials are less likely to be full professors. We also analyzed the career path of each faculty member. Economists who have early surname initials are found to be more likely to become full professors. However, this result is not significant and does not extend to mathematicians.

Open access

Shaobo Liang and Dan Wu

Abstract

With more and more users using different devices, such as personal computers, iPads, and smartphones, they can access OPAC (online public access catalog) services and other digital library services in different contexts. This leads to the phenomenon that user’s behavior can be transferred to different devices, which leads to the richness and diversity of user’s behavior data in digital libraries. A large number of user data challenge digital libraries to analyze user’s behavior, such as search preferences and borrowing habits. In this study, we study the user’s cross-device transition behavior when using OPAC. Based on the large-scale OPAC transaction log, the online activities between device transitions in the process of using OPAC are studied. In order to predict the follow-up activities that users may take, and the next device that users may use, we detect features from several perspectives and analyze the feature importance. We find that the activity and time interval on the first device are more important for predicting the user’s next activity and the next device. In addition, features of operating system help to better predict the next device. The next device used is more likely to predict the next activity after the device transition. This study examines the cross-device transition prediction in library OPAC, which can help libraries provide smart services for users when accessing OPAC on different devices.

Open access

Weijia Xu, Amit Gupta, Pankaj Jaiswal, Crispin Taylor, Patti Lockhart and Jennifer Regala

Abstract

With the increasing amount of digital journal submissions, there is a need to deploy new scalable computational methods to improve information accessibilities. One common task is to identify useful information and named entity from text documents such as journal article submission. However, there are many technical challenges to limit applicability of the general methods and lack of general tools. In this paper, we present domain informational vocabulary extraction (DIVE) project, which aims to enrich digital publications through detection of entity and key informational words and by adding additional annotations. In a first of its kind to our knowledge, our system engages authors of the peer-reviewed articles and the journal publishers by integrating DIVE implementation in the manuscript proofing and publication process. The system implements multiple strategies for biological entity detection, including using regular expression rules, ontology, and a keyword dictionary. These extracted entities are then stored in a database and made accessible through an interactive web application for curation and evaluation by authors. Through the web interface, the authors can make additional annotations and corrections to the current results. The updates can then be used to improve the entity detection in subsequent processed articles in the future. We describe our framework and deployment in details. In a pilot program, we have deployed the first phase of development as a service integrated with the journals Plant Physiology and The Plant cell published by the American Society of Plant Biologists (ASPB). We present usage statistics to date since its production on April 2018. We compare automated recognition results from DIVE with results from author curation and show the service achieved on average 80% recall and 70% precision per article. In contrast, an existing biological entity extraction tool, a biomedical named entity recognizer (ABNER), can only achieve 47% recall and return a much larger candidate set.

Open access

Maria Esteva, Ramona L. Walls, Andrew B. Magill, Weijia Xu, Ruizhu Huang, James Carson and Jawon Song

Abstract

The Identifier Services (IDS) project conducted research into and built a prototype to manage distributed genomics datasets remotely and over time. Inspired by archival concepts, IDS allows researchers to track dataset evolution through multiple copies, modifications, and derivatives, independent of where data are located – both symbolically, in the research lifecycle, and physically, in a repository or storage facility. The prototype implementation is based on a three-step data modeling process involving: a) understanding and recording of different researcher workflows, b) mapping the workflows and data to a generic data model and identifying functions, and c) integrating the data model as architecture and interactive functions into cyberinfrastructure (CI). Identity functions are operationalized as continuous tracking of authenticity attributes including data location, differences between seemingly identical datasets, metadata, data integrity, and the roles of different types of local and global identifiers used during the research lifecycle. CI resources were used to conduct identity functions at scale, including scheduling content comparison tasks on high-performance computing resources. The prototype was developed and evaluated considering six data test cases, and feedback was received through a focus-group activity. While there are some technical roadblocks to overcome, our project demonstrates that identity functions are innovative solutions to manage large distributed genomic datasets.

Open access

Lu An, Xingyue Yi, Yuxin Han and Gang Li

Abstract

This study aims at constructing a microblog influence prediction model and revealing how the user, time, and content features of microblog entries about public health emergencies affect the influence of microblog entries. Microblog entries about the Ebola outbreak are selected as data sets. The BM25 latent Dirichlet allocation model (LDA-BM25) is used to extract topics from the microblog entries. A microblog influence prediction model is proposed by using the random forest method. Results reveal that the proposed model can predict the influence of microblog entries about public health emergencies with a precision rate reaching 88.8%. The individual features that play a role in the influence of microblog entries, as well as their influence tendencies are also analyzed. The proposed microblog influence prediction model consists of user, time, and content features. It makes up the deficiency that content features are often ignored by other microblog influence prediction models. The roles of the three features in the influence of microblog entries are also discussed.

Open access

Yuan Zhang and Hsia-Ching Chang

Abstract

Healthcare communication on Twitter is challenging because the space for a tweet is limited, but the topic is too sophisticated to be concise. Comparing medical-terminology hashtags versus lay-language hashtags, this paper explores the characteristics of healthcare hashtags using an entropy matrix which derived from information theory. In this paper, the entropy matrix comprises of six different components used for constructing a tweet and serves as a framework for the structural analysis with the granularity of tweet composition. These granular components include image(s), text with semantic meanings, hashtag(s), @ username(s), hyperlink, and unused space. The entropy matrix proposed in this paper contributes to a new approach to visualizing the complexity level of hashtag collections. In addition, the calculated entropy could be an indicator of the diversity of a user’s choice across those tweet components. Furthermore, the visualizations (radar graph and scatterplot) illustrate statistical structures and the dynamics of the hashtag collections measured by entropy. The results from this study demonstrate a manifest relationship between tweet composition and the number of being retweeted.

Open access

Minghong Chen, Jingye Qu, Yuan Xu and Jiangping Chen

Abstract

Following an integrated data analytics framework that includes descriptive analysis and multiple automatic content analysis, we examined 265 projects that have been funded by the National Science Foundation (NSF) under the Smart and Connected Health (SCH) program. Our analysis discovered certain characteristics of these projects, including the distribution of the funds over years, the leading organizations in SCH, and the multidisciplinary nature of these projects. We also conducted content analysis on project titles and automatic analysis on the abstracts of the projects, including term frequency/word cloud analysis, clustering analysis, and topic modeling using Biterm method. Our analysis found that five main research areas were explored in these projects: system or platform development, modeling or algorithmic development for various purposes, designing smart health devices, clinical data collection and application, and education and academic activities of SCH. Together we obtained a comparatively fair understanding of these projects and demonstrated how different analytic approaches could complement each other. Future research will focus on the impact of these projects through an analysis of their publications and citations.

Open access

Tingting Jiang, Jiaqi Yang, Cong Yu and Yunxin Sang

Abstract

Mobile devices are gaining popularity among online shoppers whose behavior has been reshaped by the changes in screen size, interface, functionality, and context of use. This study, based on a log file from a cross-border E-commerce platform, conducted a clickstream data analysis to compare desktop and mobile users’ visiting behavior. The original 2,827,449 clickstream records generated over a 4-day period were cleaned and analyzed according to an established analysis framework at the footprint level. Differences are found between desktop and mobile users in the distribution of footprints, core footprints, and footprint depth. As the results show, online shoppers preferred to explore various products on mobile devices and read product details on desktops. The E-commerce mobile application (app) presented higher interactivity than the desktop and mobile websites, thus increasing both user involvement and product visibility. It enabled users to engage in the intended activities more effectively on the corresponding pages. Mobile users were further divided into iOS and Android users whose visiting behaviors were basically similar to each other, though the latter might experience slower response speed.

Open access

John Zhang, Ming Fan, Bin Gu, Vijay Mookerjee, Bin Zhang and J. Leon Zhao

Open access

Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu, Wei Lu and Long Lu

Abstract

The concept of Big Data is popular in a variety of domains. The purpose of this review was to summarize the features, applications, analysis approaches, and challenges of Big Data in health care. Big Data in health care has its own features, such as heterogeneity, incompleteness, timeliness and longevity, privacy, and ownership. These features bring a series of challenges for data storage, mining, and sharing to promote health-related research. To deal with these challenges, analysis approaches focusing on Big Data in health care need to be developed and laws and regulations for making use of Big Data in health care need to be enacted. From a patient perspective, application of Big Data analysis could bring about improved treatment and lower costs. In addition to patients, government, hospitals, and research institutions could also benefit from the Big Data in health care.