In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.
The Panama Canal, the 77-kilometer waterway connecting the Atlantic and Pacific oceans, has played a crucial role in international trade for more than a century. However, digging the Panama Canal was an exceedingly challenging process. A French effort in the late 19th century was abandoned because of equipment issues and a significant loss of labor due to tropical diseases transmitted by mosquitoes. The United States officially took control of the project in 1902. The United States replaced the unusable French equipment with new construction equipment that was designed for a much larger and faster scale of work. Colonel William C. Gorgas was appointed as the chief sanitation officer and charged with eliminating mosquito-spread illnesses. After overcoming these and additional trials and tribulations, the Canal successfully opened on August 15, 1914. The triumphant completion of the Panama Canal demonstrates that using the right tools and eliminating significant threats are critical steps in any project.
More than 100 years later, a paradigm shift is occurring, as we move into a data-centered era. Today, data are extremely rich but overwhelming, and extracting information out of data requires not only the right tools and methods but also awareness of major threats. In this data-intensive era, the traditional method of exploring the related publications and available datasets from previous experiments to arrive at a testable hypothesis is becoming obsolete. Consider the fact that a new article is published every 30 seconds (Jinha, 2010). In fact, for the common disease of diabetes, there have been roughly 500,000 articles published to date; even if a scientist reads 20 papers per day, he will need 68 years to wade through all the material. The standard method simply cannot sufficiently deal with the large volume of documents or the exponential growth of datasets. A major threat is that the canon of domain knowledge cannot be consumed and held in human memory. Without efficient methods to process information and without a way to eliminate the fundamental threat of limited memory and time to handle the data deluge, we may find ourselves facing failure as the French did on the Isthmus of Panama more than a century ago.
Scouring the literature and data to generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets (Evans & Foster, 2011). In reality, most scholars have never been able to keep completely up-to-date with publications and datasets considering the unending increase in quantity and diversity of research within their own areas of focus, let alone in related conceptual areas in which knowledge may be segregated by syntactically impenetrable keyword barriers or an entirely different research corpus.
Research communities in many disciplines are finally recognizing that with advances in information technology there needs to be new ways to extract entities from increasingly data-intensive publications and to integrate and analyze large-scale datasets. This provides a compelling opportunity to improve the process of knowledge discovery from the literature and datasets through use of knowledge graphs and an associated framework that integrates scholars, domain knowledge, datasets, workflows, and machines on a scale previously beyond our reach (Ding et al., 2013).
Zhi Ying. Ren, ChengHui. Gao, GuoQiang. Han, Shen Ding and JianXing. Lin
Dual tree complex wavelet transform (DT-CWT) exhibits superiority of shift invariance, directional selectivity, perfect reconstruction (PR), and limited redundancy and can effectively separate various surface components. However, in nano scale the morphology contains pits and convexities and is more complex to characterize. This paper presents an improved approach which can simultaneously separate reference and waviness and allows an image to remain robust against abnormal signals. We included a bilateral filtering (BF) stage in DT-CWT to solve imaging problems. In order to verify the feasibility of the new method and to test its performance we used a computer simulation based on three generations of Wavelet and Improved DT-CWT and we conducted two case studies. Our results show that the improved DT-CWT not only enhances the robustness filtering under the conditions of abnormal interference, but also possesses accuracy and reliability of the reference and waviness from the 3-D nano scalar surfaces.
Wen-mao Ding, Lan Li, Rui-ying Wang and Zhu-ling Cao
Background: Nicotine can affect the development of Atherosclerosis (AS). Monocytes/macrophages are the important cells hi the AS lesions.
Objective: We studied the mechanisms of smoking on AS. The effects of nicotine on macrophage were investigated hi this study.
Methods: Different concentration of nicotine (6 × 10-9~-5 mol/L), different incubation time (3, 6, 9, 12, 18, and 24 horn s) and 7 β-hydroxycholesterol (50 μg ml) were schemed in this study. After exposure of macrophage to those different conditions, lactate dehydrogenase (LDH) activity and tumor necrosis factor-⃞ (TNF-α) content in the supernatant were assayed.
Results: Nicotine (6 × 10-9mol/L~-6×10-5mol/L) treatment resulted in a marked reduction of LDH in the supernatant (131,0±9.6 U/L. 129.7±6.2 U/L, 129.4±5.3 U/L, 134.2±8.4 U/L, and 138.3+9.7 U/L vs. 151.3+8.1 U/L, p <0.05 respectively, q-test). The same change trend was seen when co-treated with 7β-hydroxycholestrol and nicotine (135.7±7.6U/L, 135.6±6.6U/L, 136.1±6.7 U/L, 142.9±4.5 U/L, and 146.4±4.4 U/L vs. 152.4⃞6.2U/L, P<0.05 respectively, q-test). The peak effects occurred at the nicotine concentration of 6 × 10-7mol/L and the first 18-hours incubation. Nicotine (6 ×10-9mol/L~6 × 10-6mol L) treatment result in the increase of TNF-α in the supernatant (0.28±0.06 ng/mL, 0.32±0.05 ng/mL, 0.40±0.07 ng/mL. and 0.30±0.08 ng/mL vs. 0.17±0.05 ng/mL, p <0.05 respectively, q-test). Nicotine (6 × 10-5mol/L) treatment have no significant hicrease compared to the control group (0.21±0.08 ng/mL vs. 0.17+0.05 ng/mL, p >0.05, q-test). The peak effects occurred at the nicothie concentration of 6 × 10-7mol/L.
Conclusions: Nicotine can produce the beneficial effect on macrophage. Nicotine treatment can activate macrophage to produce TNF-α. Thus, nicotine can be a mechanism on the development of atherosclerosis.
This paper aims to better understand a large number of papers in the medical domain of Alzheimer’s disease (AD) and related diseases using the machine reading approach.
The study uses the topic modeling method to obtain an overview of the field, and employs open information extraction to further comprehend the field at a specific fact level.
Several topics within the AD research field are identified, such as the Human Immunodeficiency Virus (HIV)/Acquired Immune Deficiency Syndrome (AIDS), which can help answer the question of how AIDS/HIV and AD are very different yet related diseases.
Some manual data cleaning could improve the study, such as removing incorrect facts found by open information extraction.
This study uses the literature to answer specific questions on a scientific domain, which can help domain experts find interesting and meaningful relations among entities in a similar manner, such as to discover relations between AD and AIDS/HIV.
Both the overview and specific information from the literature are obtained using two distinct methods in a complementary manner. This combination is novel because previous work has only focused on one of them, and thus provides a better way to understand an important scientific field using data-driven methods.
Yiming Zhao, Baitong Chen, Jin Zhang, Ying Ding, Jin Mao and Lihong Zhou
This study investigates the evolution of diabetics’ concerns based on the analysis of terms in the Diabetes category logs on the Yahoo! Answers website. Two sets of question-and-answer (Q&A) log data were collected: one from December 2, 2005 to December 1, 2006; the other from April 1, 2013 to March 31, 2014. Network analysis and a t-test were performed to analyze the differences in diabetics’ concerns between these two data sets. Community detection and topic evolution were used to reveal detailed changes in diabetics’ concerns in the examined period. Increases in average node degree and graph density imply that the vocabulary size that diabetics use to post questions decreases while the scope of questions has become more focused. The networks of key terms in the Q&A log data of 2005–2006 and 2013–2014 are significantly different according to the t-test analysis of the degree centrality and betweenness centrality. Specifically, there is a shift in diabetics’ focus in that they have become more concerned about daily life and other nonmedical issues, including diet, food, and nutrients. The recent changes and the evolution paths of diabetics’ concerns were visualized using an alluvial diagram. The food- and diet-related terms have become prominent, as deduced from the visualization results.
Rong Tan, De-Ying Hu, Yan-Hong Han, Yi-Lan Liu, Xiao-Ping Ding, Shu-Jie Wang and Ke Xu
The aim of this study was to explore the characteristics of and preventive management strategies for suicidal inpatients in a general hospital.
A total of 54 suicide victims were drawn from a patient safety adverse event network reporting system during hospitalization in a general hospital from November 2008 to January 2017.
Subjects who committed suicide in the general hospital were women and those who suffered from malignant neoplasms during general hospital treatment. Furthermore, most of the patients who committed suicide used more violent suicide methods. The most common and lethal means was jumping from heights at the windowsill.
It is concluded that management strategies for suicide prevention can be provided from the aspects of patients, medical staff and the hospital environment. It is not only urgent but also feasible to reduce the suicide rate of inpatients and further improve hospital safety management.
Xianlei Dong, Jian Xu, Ying Ding, Chenwei Zhang, Kunpeng Zhang and Min Song
We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications.
First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results.
The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic.
First, we study a correlation rather than causality between topics’ trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables.
This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications.
To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic’s evolvement and people’s search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.
Guo-liang Zhang, Jian-bo Ding, Shuang-jie Li, Xi Zhang, Yi Xu, Hua-sheng Yang, Dan Wei, Qin Li, Qing-sheng Shi, Qing-xiong Zhu, Tong Yang, Zi-qiang Zhuo, Yi-mei Tian, Hao-jie Zheng, Liu-ping Tang, Xin-ying Zou, Tao Wen and Xiu-hui Li
Objective To evaluate the efficacy and safety of traditional Chinese medicine (TCM) combined with Western medicine in the treatment of patients with common hand, foot and mouth disease (HFMD) by conducting a prospective, controlled, and randomized trial.
Methods A total of 452 patients with common HFMD were randomly assigned to receive Western medicine alone (n = 220) or combined with TCM (Reduning or Xiyanping injections) (n = 232). The primary outcome was the incidence rate of rash/herpes disappearance within 5 days, while secondary outcomes included the incidence rate for fever, cough, lethargy, agitation, and vomiting clearance within 5 days.
Results The rash/herpes disappearance rate was 45.5% (100/220) in Western medicine therapy group, and 67.2% (156/232) in TCM and Western medicine combined therapy group, with significant difference (P < 0.001). Moreover, TCM remarkably increased the incidence rate of secondary disappearance, which was 56.4% in Western medicine therapy group and 71.4% in TCM and Western medicine combined therapy group (P = 0.001). No drug-related adverse events were observed.
Conclusions It’s suggested that the integrative TCM and Western medicine therapy achieved a better therapeutic efficacy. TCM may become an important complementary therapy on relieving the symptoms of HFMD.