Will R. Thomas, Benjamin Galewsky, Sandeep Puthanveetil Satheesan, Gregory Jansen, Richard Marciano, Shannon Bradley, Jong Lee, Luigi Marini and Kenton McHenry
The emerging transdiscipline of Computational Archival Science (CAS) links frameworks such as Brown Dog and repository software such as Digital Repository At Scale To Invite Computation (DRAS-TIC) to yield an understanding of working with digital collections at scale for cultural data. The DRAS-TIC and Brown Dog projects here serve as the basis for an expandable distributed storage/service architecture with on-demand, horizontally scalable integrated digital preservation and analysis services.
With more and more users using different devices, such as personal computers, iPads, and smartphones, they can access OPAC (online public access catalog) services and other digital library services in different contexts. This leads to the phenomenon that user’s behavior can be transferred to different devices, which leads to the richness and diversity of user’s behavior data in digital libraries. A large number of user data challenge digital libraries to analyze user’s behavior, such as search preferences and borrowing habits. In this study, we study the user’s cross-device transition behavior when using OPAC. Based on the large-scale OPAC transaction log, the online activities between device transitions in the process of using OPAC are studied. In order to predict the follow-up activities that users may take, and the next device that users may use, we detect features from several perspectives and analyze the feature importance. We find that the activity and time interval on the first device are more important for predicting the user’s next activity and the next device. In addition, features of operating system help to better predict the next device. The next device used is more likely to predict the next activity after the device transition. This study examines the cross-device transition prediction in library OPAC, which can help libraries provide smart services for users when accessing OPAC on different devices.
Beth A. Plale, Eleanor Dickson, Inna Kouper, Samitha Harshani Liyanage, Yu Ma, Robert H. McDonald, John A. Walsh and Sachith Withana
Open science is prompting wide efforts to make data from research available for broader use. However, sharing data is complicated by important protections on the data (e.g., protections of privacy and intellectual property). The spectrum of options existing between data needing to be fully open access and data that simply cannot be shared at all is quite limited. This paper puts forth a generalized remote secure enclave as a socio-technical framework consisting of policies, human processes, and technologies that work hand in hand to enable controlled access and use of restricted data. Based on experience in implementing the enclave for computational, analytical access to a massive collection of in-copyright texts, we discuss the synergies and trade-offs that exist between software components and policy and process components in striking the right balance between safety for the data, ease of use, and efficiency.
Liang Hong, Mengqi Luo, Ruixue Wang, Peixin Lu, Wei Lu and Long Lu
The concept of Big Data is popular in a variety of domains. The purpose of this review was to summarize the features, applications, analysis approaches, and challenges of Big Data in health care. Big Data in health care has its own features, such as heterogeneity, incompleteness, timeliness and longevity, privacy, and ownership. These features bring a series of challenges for data storage, mining, and sharing to promote health-related research. To deal with these challenges, analysis approaches focusing on Big Data in health care need to be developed and laws and regulations for making use of Big Data in health care need to be enacted. From a patient perspective, application of Big Data analysis could bring about improved treatment and lower costs. In addition to patients, government, hospitals, and research institutions could also benefit from the Big Data in health care.
Tingting Jiang, Jiaqi Yang, Cong Yu and Yunxin Sang
Mobile devices are gaining popularity among online shoppers whose behavior has been reshaped by the changes in screen size, interface, functionality, and context of use. This study, based on a log file from a cross-border E-commerce platform, conducted a clickstream data analysis to compare desktop and mobile users’ visiting behavior. The original 2,827,449 clickstream records generated over a 4-day period were cleaned and analyzed according to an established analysis framework at the footprint level. Differences are found between desktop and mobile users in the distribution of footprints, core footprints, and footprint depth. As the results show, online shoppers preferred to explore various products on mobile devices and read product details on desktops. The E-commerce mobile application (app) presented higher interactivity than the desktop and mobile websites, thus increasing both user involvement and product visibility. It enabled users to engage in the intended activities more effectively on the corresponding pages. Mobile users were further divided into iOS and Android users whose visiting behaviors were basically similar to each other, though the latter might experience slower response speed.
Minghong Chen, Jingye Qu, Yuan Xu and Jiangping Chen
Following an integrated data analytics framework that includes descriptive analysis and multiple automatic content analysis, we examined 265 projects that have been funded by the National Science Foundation (NSF) under the Smart and Connected Health (SCH) program. Our analysis discovered certain characteristics of these projects, including the distribution of the funds over years, the leading organizations in SCH, and the multidisciplinary nature of these projects. We also conducted content analysis on project titles and automatic analysis on the abstracts of the projects, including term frequency/word cloud analysis, clustering analysis, and topic modeling using Biterm method. Our analysis found that five main research areas were explored in these projects: system or platform development, modeling or algorithmic development for various purposes, designing smart health devices, clinical data collection and application, and education and academic activities of SCH. Together we obtained a comparatively fair understanding of these projects and demonstrated how different analytic approaches could complement each other. Future research will focus on the impact of these projects through an analysis of their publications and citations.
Healthcare communication on Twitter is challenging because the space for a tweet is limited, but the topic is too sophisticated to be concise. Comparing medical-terminology hashtags versus lay-language hashtags, this paper explores the characteristics of healthcare hashtags using an entropy matrix which derived from information theory. In this paper, the entropy matrix comprises of six different components used for constructing a tweet and serves as a framework for the structural analysis with the granularity of tweet composition. These granular components include image(s), text with semantic meanings, hashtag(s), @ username(s), hyperlink, and unused space. The entropy matrix proposed in this paper contributes to a new approach to visualizing the complexity level of hashtag collections. In addition, the calculated entropy could be an indicator of the diversity of a user’s choice across those tweet components. Furthermore, the visualizations (radar graph and scatterplot) illustrate statistical structures and the dynamics of the hashtag collections measured by entropy. The results from this study demonstrate a manifest relationship between tweet composition and the number of being retweeted.
Research methods play an extremely important role in studies. Statistical methods are fundamental and vital for quantitative research. The authors of this paper investigated the research papers that used statistical methods including parametric inferential statistical methods, nonparametric inferential statistical methods, predictive statistical correlation methods, and predictive statistical regression methods in library and information science and examined the connections and interactions between statistical methods and their application areas including information creation, information selection and control, information organization, information retrieval, information dissemination, and information use. Both an inferential statistical method and graphic clustering visualization method were employed to explore the relationships between statistical methods and application areas and reveal the hidden interaction patterns. As a result, 1821 research papers employing statistical methods were identified among the papers published in six major library and information science journals from 1999 to 2017. The findings showed that application areas affected the types of statistical methods utilized. Studies in information organization and information retrieval tended to employ parametric and nonparametric inferential methods, while correlation and regression methods were applied more in studies in information use, information dissemination, information creation, and information selection and control field. These findings help researchers better understand the statistical method orientation of library and information science studies and assist educators in the field to develop applicable quantitative research methodology courses.