Browse

You are looking at 1 - 10 of 1,449 items for :

  • Information Technology x
Clear All
Open access

Chuanming Yu, Xingyu Zhu, Bolin Feng, Lin Cai and Lu An

Abstract

Purpose

Online reviews on tourism attractions provide important references for potential tourists to choose tourism spots. The main goal of this study is conducting sentiment analysis to facilitate users comprehending the large scale of the reviews, based on the comments about Chinese attractions from Japanese tourism website 4Travel.

Design/methodology/approach

Different statistics- and rule-based methods are used to analyze the sentiment of the reviews. Three groups of novel statistics-based methods combining feature selection functions and the traditional term frequency-inverse document frequency (TF-IDF) method are proposed. We also make seven groups of different rules-based methods. The macro-average and micro-average values for the best classification results of the methods are calculated respectively and the performance of the methods are shown.

Findings

We compare the statistics-based and rule-based methods separately and compare the overall performance of the two method. According to the results, it is concluded that the combination of feature selection functions and weightings can strongly improve the overall performance. The emotional vocabulary in the field of tourism (EVT), kaomojis, negative and transitional words can notably improve the performance in all of three categories. The rule-based methods outperform the statistics-based ones with a narrow advantage.

Research limitation

Two limitations can be addressed: 1) the empirical studies to verify the validity of the proposed methods are only conducted on Japanese languages; and 2) the deep learning technology is not been incorporated in the methods.

Practical implications

The results help to elucidate the intrinsic characteristics of the Japanese language and the influence on sentiment analysis. These findings also provide practical usage guidelines within the field of sentiment analysis of Japanese online tourism reviews.

Originality/value

Our research is of practicability. Currently, there are no studies that focus on the sentiment analysis of Japanese reviews about Chinese attractions.

Open access

Yaoyao Song, Torben Schubert, Huihui Liu and Guoliang Yang

Abstract

Purpose

This paper aims to investigate the scientific productivity of China’s science system.

Design/methodology/approach

This paper employs the Malmquist productivity index (MPI) based on Data Envelopment Analysis (DEA).

Findings

The results reveal that the overall efficiency of Chinese universities increased significantly from 2009 to 2016, which is mainly driven by technological progress. From the perspective of the functions of higher education, research and transfer activities perform better than the teaching activities.

Research limitations

As an implication, the indicator selection mechanism, investigation period and the MPI model can be further extended in the future research.

Practical implications

The results indicate that Chinese education administrative departments should take actions to guide and promote the teaching activities and formulate reasonable resource allocation regulations to reach the balanced development in Chinese universities.

Originality/value

This paper selects 58 Chinese universities and conducts a quantified measurement during the period 2009–2016. Three main functional activities of universities (i.e. teaching, researching, and application) are innovatively categorized into different schemes, and we calculate their performance, respectively.

Open access

Haiyun Xu, Chao Wang, Kun Dong and Zenghui Yue

Abstract

Purpose

Formal concept analysis (FCA) and concept lattice theory (CLT) are introduced for constructing a network of IDR topics and for evaluating their effectiveness for knowledge structure exploration.

Design/methodology/approach

We introduced the theory and applications of FCA and CLT, and then proposed a method for interdisciplinary knowledge discovery based on CLT. As an example of empirical analysis, interdisciplinary research (IDR) topics in Information & Library Science (LIS) and Medical Informatics, and in LIS and Geography-Physical, were utilized as empirical fields. Subsequently, we carried out a comparative analysis with two other IDR topic recognition methods.

Findings

The CLT approach is suitable for IDR topic identification and predictions.

Research limitations

IDR topic recognition based on the CLT is not sensitive to the interdisciplinarity of topic terms, since the data can only reflect whether there is a relationship between the discipline and the topic terms. Moreover, the CLT cannot clearly represent a large amounts of concepts.

Practical implications

A deeper understanding of the IDR topics was obtained as the structural and hierarchical relationships between them were identified, which can help to get more precise identification and prediction to IDR topics.

Originality/value

IDR topics identification based on CLT have performed well and this theory has several advantages for identifying and predicting IDR topics. First, in a concept lattice, there is a partial order relation between interconnected nodes, and consequently, a complete concept lattice can present hierarchical properties. Second, clustering analysis of IDR topics based on concept lattices can yield clusters that highlight the essential knowledge features and help display the semantic relationship between different IDR topics. Furthermore, the Hasse diagram automatically displays all the IDR topics associated with the different disciplines, thus forming clusters of specific concepts and visually retaining and presenting the associations of IDR topics through multiple inheritance relationships between the concepts.

Open access

Leo Egghe, Yves Fassin and Ronald Rousseau

Abstract

Purpose

To show for which publication-citation arrays h-type indices are equal and to reconsider rational h-type indices. Results for these research questions fill some gaps in existing basic knowledge about h-type indices.

Design/methodology/approach

The results and introduction of new indicators are based on well-known definitions.

Findings

The research purpose has been reached: answers to the first questions are obtained and new indicators are defined.

Research limitations

h-type indices do not meet the Bouyssou-Marchant independence requirement.

Practical implications

On the one hand, more insight has been obtained for well-known indices such as the h- and the g-index and on the other hand, simple extensions of existing indicators have been added to the bibliometric toolbox. Relative rational h-type indices are more useful for individuals than the existing absolute ones.

Originality/value

Answers to basic questions such as “when are the values of two h-type indices equal” are provided. A new rational h-index is introduced.

Open access

Dag W. Aksnes and Gunnar Sivertsen

Abstract

Purpose

The purpose of this study is to assess the coverage of the scientific literature in Scopus and Web of Science from the perspective of research evaluation.

Design/methodology/approach

The academic communities of Norway have agreed on certain criteria for what should be included as original research publications in research evaluation and funding contexts. These criteria have been applied since 2004 in a comprehensive bibliographic database called the Norwegian Science Index (NSI). The relative coverages of Scopus and Web of Science are compared with regard to publication type, field of research and language.

Findings

Our results show that Scopus covers 72 percent of the total Norwegian scientific and scholarly publication output in 2015 and 2016, while the corresponding figure for Web of Science Core Collection is 69 percent. The coverages are most comprehensive in medicine and health (89 and 87 percent) and in the natural sciences and technology (85 and 84 percent). The social sciences (48 percent in Scopus and 40 percent in Web of Science Core Collection) and particularly the humanities (27 and 23 percent) are much less covered in the two international data sources.

Research limitation

Comparing with data from only one country is a limitation of the study, but the criteria used to define a country’s scientific output as well as the identification of patterns of field-dependent partial representations in Scopus and Web of Science should be recognizable and useful also for other countries.

Originality/value

The novelty of this study is the criteria-based approach to studying coverage problems in the two data sources.

Open access

Liying Yang

Open access

Ruben Recabarren and Bogdan Carbunar

Abstract

Providing reliable and surreptitious communications is difficult in the presence of adaptive and resourceful state level censors. In this paper we introduce Tithonus, a framework that builds on the Bitcoin blockchain and network to provide censorship-resistant communication mechanisms. In contrast to previous approaches, we do not rely solely on the slow and expensive blockchain consensus mechanism but instead fully exploit Bitcoin’s peer-to-peer gossip protocol. We develop adaptive, fast and cost effective data communication solutions that camouflage client requests into inconspicuous Bitcoin transactions. We propose solutions to securely request and transfer content, with unobservability and censorship resistance, and free, pay-per-access and subscription based payment options. When compared to state-of-the-art Bitcoin writing solutions, Tithonus reduces the cost of transferring data to censored clients by 2 orders of magnitude and increases the goodput by 3 to 5 orders of magnitude. We show that Tithonus client initiated transactions are hard to detect, while server initiated transactions cannot be censored without creating split world problems to the Bit-coin blockchain.

Open access

Alexandros Mittos, Bradley Malin and Emiliano De Cristofaro

Abstract

Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts.

Open access

Thee Chanyaswad, Changchang Liu and Prateek Mittal

Abstract

A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Open access

Anselme Tueno, Florian Kerschbaum and Stefan Katzenbeisser

Abstract

Decision trees are widespread machine learning models used for data classification and have many applications in areas such as healthcare, remote diagnostics, spam filtering, etc. In this paper, we address the problem of privately evaluating a decision tree on private data. In this scenario, the server holds a private decision tree model and the client wants to classify its private attribute vector using the server’s private model. The goal is to obtain the classification while preserving the privacy of both – the decision tree and the client input. After the computation, only the classification result is revealed to the client, while nothing is revealed to the server. Many existing protocols require a constant number of rounds. However, some of these protocols perform as many comparisons as there are decision nodes in the entire tree and others transform the whole plaintext decision tree into an oblivious program, resulting in higher communication costs. The main idea of our novel solution is to represent the tree as an array. Then we execute only d – the depth of the tree – comparisons. Each comparison is performed using a small garbled circuit, which output secret-shares of the index of the next node. We get the inputs to the comparison by obliviously indexing the tree and the attribute vector. We implement oblivious array indexing using either garbled circuits, Oblivious Transfer or Oblivious RAM (ORAM). Using ORAM, this results in the first protocol with sub-linear cost in the size of the tree. We implemented and evaluated our solution using the different array indexing procedures mentioned above. As a result, we are not only able to provide the first protocol with sublinear cost for large trees, but also reduce the communication cost for the large real-world data set “Spambase” from 18 MB to 1[triangleright]2 MB and the computation time from 17 seconds to less than 1 second in a LAN setting, compared to the best related work.