Currently, we are witnessing the emergence and abundance of many different data repositories and archival systems for scientific data discovery, use, and analysis. With the burgeoning of available data-sharing platforms, this study addresses how scientists working in the fields of natural resources and environmental sciences navigate these diverse data sources, what their concerns and value propositions are toward multiple data discovery channels, and most importantly, how they perceive the characteristics and compare the functionalities of different types of data repository systems. Through a user community research of domain scientists on their data use dynamics and insights, this research provides strategies and discusses ideas on how to leverage these different platforms. Furthermore, it proposes a top–down, novel approach to the processes of searching, browsing, and visualizing for the dynamic exploration of environmental data.
The identification and tracking of technology trends in an industry is crucial for effective information management, as well as for companies to maintain their competitive edge in a changing technological environment. A novel method that combines patentometrics, time series analysis, and social network analysis is proposed to capture the evolution of technology topics and to monitor the vicissitudes of dominators. Taking patents in the solar cell field as an example, a total of 3,820 patents issued between 1997 and 2011 were collected from the United States Patent and Trademark Office database. We divided the examined time span into five 3-year periods, during which the technology dominators, who are the major contributors of patents in a technological field, were identified. These key assignees were also classified as stable, appearing, or exiting based on their transition patterns from one time period to the next. Results show that solar cell patents can be grouped into eight major technology communities, and that the frequency of change in technology dominators across the years varied for each community. We further examined the relationship between a technology dominator’s transition pattern and the changes in its patent characteristics. The appearing technology dominators were found to have increased values for several patent characteristics, including science linkage, pendency period, originality index, and endogeneity index, while their technology cycle time decreased; the stable technology dominators exhibited decreasing science linkage and originality index values; and exiting technology dominators showed trends in patent characteristics that were opposite to that of the appearing technology dominators. By using the methodology proposed in this study, companies can gain critical insights into the major trends of a technological field, which would be invaluable to the planning and assessment of a company’s research-and-development strategies.
The Simple Protocol and RDF Query Language (SPARQL) query language allows users to issue a structural query over a resource description framework (RDF) graph. However, the lack of a spatiotemporal query language limits the usage of RDF data in spatiotemporal-oriented applications. As the spatiotemporal information continuously increases in RDF data, it is necessary to design an effective and efficient spatiotemporal RDF data management system. In this paper, we formally define the spatiotemporal information-integrated RDF data, introduce a spatiotemporal query language that extends the SPARQL language with spatiotemporal assertions to query spatiotemporal information-integrated RDF data, and design a novel index and the corresponding query algorithm. The experimental results on a large, real RDF graph integrating spatial and temporal information (> 180 million triples) confirm the superiority of our approach. In contrast to its competitors, gst-store outperforms by more than 20%-30% in most cases.
Ahmed AlKalbani, Hepu Deng, Booi Kam and Xiaojuan Zhang
The increasing recognition of the importance of information security has created institutional pressures on organizations to comply with information security standards and policies for protecting their information. How such pressures influence information security compliance in organisations, however, is unclear. This paper presents an empirical study to investigate the impact of institutional pressures on information security compliance in organizations. With the use of structural equation modelling for analysing the data collected through an online survey, the study shows that coercive pressures, normative pressures, and mimetic pressures positively influence information security compliance in organizations. It reveals that the benefits of information security compliance motivate management to strengthen their commitments at information security compliance. Furthermore, the study finds out that social pressures do not have a significant impact on management commitments towards information security compliance. Theoretically this study contributes to the information security research by better understanding how institutional pressures can be used for enhancing information security compliance in organizations. Practically this study informs information security policy makers of the major institutional drivers for information security compliance.
Drawing upon the resource-based and relational view, this study examines how the three types of IT competencies (i.e., IT objects, IT operations, and IT knowledge) differentially affect firm performance and how such effects are moderated by interorganizational communication (IOC). We test the hypotheses of interest with data collected from 258 firms in China. The results of hierarchical regression analysis reveal that IT operations and IT knowledge significantly improve firm performance, while IT objects are found to be insignificant. In addition, the moderating effect of IOC on the relationship between the three types of IT competencies and firm performance varies across diffenent types of IT competencies. Specifically, IOC positively moderates the relationship between both IT operations and IT knowledge and firm performance. However, the moderating effect of IOC on the relationship between IT objects and firm performance is not significant.
Our motivation for conducting this research is driven by the lack of studies focusing on the acknowledgments sections of published papers. Another motivation is the lack of a study examining the countries and organizations mentioned in the acknowledgments section and their influence—something that cannot be analyzed using a citation or co-authorship relationship. Concentrating on the qualitative aspects of acknowledgments has been limited because of the atypical pattern of the acknowledgment section. Our research aims to identify useful information hidden within the acknowledgment sections of the articles stored in the PubMed Central database and to analyze a map of influence via a country-acknowledgment network. To solve the problems, we use the topic modeling to analyze topics of acknowledgments and conduct a basic network analysis to find the difference in the co-the country network and acknowledgment network. A word-embedding model is used to compare the semantic similarity that exists between the authors and countries extracted from our original dataset. The result of topic modeling suggests that funding has become a critical topic in acknowledgments. The results of network analysis indicate that some large countries work as hubs in terms of both implicitly and explicitly while revealing that some countries such as China do not frequently work with other countries. The word-embedding model built by acknowledgments suggests that the authors frequently referenced in acknowledgments are also likely to be referred to in a similar context. It also implies that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country. Through these results, we conclude that the content in acknowledgments extracted from the papers can be divided into two categories—funding and appreciation. We also find that there is no clear relationship between the publication country and the countries mentioned in the acknowledgment section.
With vast amount of biomedical literature available online, doctors have the benefits of consulting the literature before making clinical decisions, but they are facing the daunting task of finding needles in haystacks. In this situation, it would be of great use to the doctors if an effective clinical decision support system is available to generate accurate queries and return a manageable size of highly useful articles. Existing studies showed the usefulness of patients’ diagnosis information in supporting effective retrieval of relevant literature, but such diagnosis information is often missing in most cases. Furthermore, existing diagnosis prediction systems mainly focus on predicting a small range of diseases with well-formatted features, and it is still a great challenge to perform large-scale automatic diagnosis predictions based on noisy medical records of the patient. In this paper, we propose automatic diagnosis prediction methods for enhancing the retrieval in a clinical decision support system, where the prediction is based on evidences automatically collected from publicly accessible online knowledge bases such as Wikipedia and Semantic MEDLINE Database (SemMedDB). The assumption is that relevant diseases and their corresponding symptoms co-occur more frequently in these knowledge bases. Our methods use Markov Random Field (MRF) model to identify diagnosis candidates in the knowledge bases, and their performance was evaluated using test collections from the Clinical Decision Support (CDS) track in TREC 2014, 2015, and 2016. The results show that our methods can automatically predict diagnosis with about 75% accuracy, and such predictions can significantly improve the related biomedical literatures retrieval. Our methods can generate comparable retrieval results to the state-of-the-art methods, which utilize much more complicated methods and some manually crafted medical knowledge. One possible future work is to apply these methods in collaboration with real doctors.
Notes: a portion of this work was published in iConference 2017 as a poster, which won the best poster award. This paper greatly expands the research scope over that poster.
Book search is far from a solved problem. Complex information needs often go beyond bibliographic facts and cover a combination of different aspects, such as specific genres or plot elements, engagement or novelty. Conventional book metadata may not be sufficient to address these kinds of information needs. In this paper, we present a large-scale empirical comparison of the effectiveness of book metadata elements for searching complex information needs. Using a test collection of over 2 million book records and over 330 real-world book search requests, we perform a highly controlled and in-depth analysis of topical metadata, comparing controlled vocabularies with social tags. Tags perform better overall in this setting, but controlled vocabulary terms provide complementary information, which will improve a search. We analyze potential underlying factors that contribute to search performance, such as the relevance aspect(s) mentioned in a request or the type of book. In addition, we investigate the possible causes of search failure. We conclude that neither tags nor controlled vocabularies are wholly suited to handling the complex information needs in book search, which means that different approaches to describe topical information in books are needed.