Persona is a common human-computer interaction technique for increasing stakeholders’ understanding of audiences, customers, or users. Applied in many domains, such as e-commerce, health, marketing, software development, and system design, personas have remained relatively unchanged for several decades. However, with the increasing popularity of digital user data and data science algorithms, there are new opportunities to progressively shift personas from general representations of user segments to precise interactive tools for decision-making. In this vision, the persona profile functions as an interface to a fully functional analytics system. With this research, we conceptually investigate how data-driven personas can be leveraged as analytics tools for understanding users. We present a conceptual framework consisting of (a) persona benefits, (b) analytics benefits, and (c) decision-making outcomes. We apply this framework for an analysis of digital marketing use cases to demonstrate how data-driven personas can be leveraged in practical situations. We then present a functional overview of an actual data-driven persona system that relies on the concept of data aggregation in which the fundamental question defines the unit of analysis for decision-making. The system provides several functionalities for stakeholders within organizations to address this question.
With the rapid growth of the smartphone and tablet market, mobile application (App) industry that provides a variety of functional devices is also growing at a striking speed. Product life cycle (PLC) theory, which has a long history, has been applied to a great number of industries and products and is widely used in the management domain. In this study, we apply classical PLC theory to mobile Apps on Apple smartphone and tablet devices (Apple App Store). Instead of trying to utilize often-unavailable sales or download volume data, we use open-access App daily download rankings as an indicator to characterize the normalized dynamic market popularity of an App. We also use this ranking information to generate an App life cycle model. By using this model, we compare paid and free Apps from 20 different categories. Our results show that Apps across various categories have different kinds of life cycles and exhibit various unique and unpredictable characteristics. Furthermore, as large-scale heterogeneous data (e.g., user App ratings, App hardware/software requirements, or App version updates) become available and are attached to each target App, an important contribution of this paper is that we perform in-depth studies to explore how such data correlate and affect the App life cycle. Using different regression techniques (i.e., logistic, ordinary least squares, and partial least squares), we built different models to investigate these relationships. The results indicate that some explicit and latent independent variables are more important than others for the characterization of App life cycle. In addition, we find that life cycle analysis for different App categories requires different tailored regression models, confirming that inner-category App life cycles are more predictable and comparable than App life cycles across different categories.
Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.
Filtering out irrelevant documents and classifying the relevant ones into topical categories is a de facto task in many applications. However, supervised learning solutions require extravagant human efforts on document labeling. In this paper, we propose a novel seed-guided topic model for dataless short text classification and filtering, named SSCF. Without using any labeled documents, SSCF takes a few “seed words” for each category of interest, and conducts short text filtering and classification in a weakly supervised manner. To overcome the issues of data sparsity and imbalance, the short text collection is mapped to a collection of pseudodocuments, one for each word. SSCF infers two kinds of topics on pseudo-documents: category-topics and general-topics. Each category-topic is associated with one category of interest, covering the meaning of the latter. In SSCF, we devise a novel word relevance estimation process based on the seed words, for hidden topic inference. The dominating topic of a short text is identified through post inference and then used for filtering and classification. On two real-world datasets in two languages, experimental results show that our proposed SSCF consistently achieves better classification accuracy than state-of-the-art baselines. We also observe that SSCF can even achieve superior performance than the supervised classifiers supervised latent dirichlet allocation (sLDA) and support vector machine (SVM) on some testing tasks.
A number of deep neural networks have been proposed to improve the performance of document ranking in information retrieval studies. However, the training processes of these models usually need a large scale of labeled data, leading to data shortage becoming a major hindrance to the improvement of neural ranking models’ performances. Recently, several weakly supervised methods have been proposed to address this challenge with the help of heuristics or users’ interaction in the Search Engine Result Pages (SERPs) to generate weak relevance labels. In this work, we adopt two kinds of weakly supervised relevance, BM25-based relevance and click model-based relevance, and make a deep investigation into their differences in the training of neural ranking models. Experimental results show that BM25-based relevance helps models capture more exact matching signals, while click model-based relevance enhances the rankings of documents that may be preferred by users. We further proposed a cascade ranking framework to combine the two weakly supervised relevance, which significantly promotes the ranking performance of neural ranking models and outperforms the best result in the last NTCIR-13 We Want Web (WWW) task. This work reveals the potential of constructing better document retrieval systems based on multiple kinds of weak relevance signals.
The study aims to reveal the role of social media and its influence on information sharing within public organizations and emphasis on the distribution affordance to facilitate information processes. Existing literature emphasized different aspects of social media in the public sector to promote the relationship between government and citizens or provide better public service, for example, innovation, policies, openness, and communication. However, there is a wide gap in the literature to investigate social media use and information sharing within public organizations. The current study tries to accomplish the goal by conducting semi-structured interviews with 15 employees in public organizations in Chaohu city, China and applying content analysis on the interviews. Despite the existing literature, the targeted group for this study is divided into three levels (i) senior-level, (ii) middle-level, and (iii) junior-level employees to get a better view of social media. The study is based on grounded theory for coding analysis. We provide an overview of social media use within Chinese public organizations and discuss five social media affordances involved in the public organizations. Finally, we provide the implications, limitations, recommendation, and future research of this research area.
Danmu function as an augmented comment feature has been adopted by almost all live streaming platforms to foster interaction between viewers and the streamer in China. However, few studies have been conducted to understand the determinants of users’ Danmu sending behavior on live streaming platforms. This study examines this phenomenon from the lens of effectance theory and the S-O-R framework. We propose that two effectances – Danmu effectance and live streaming effectance – play an essential role in active Danmu participation. In addition, we explore the effects of time-enhanced (synchronicity) and space-enhanced technical characteristic (visibility) of Danmu on live streaming platforms on two effectances. Data analysis of 877 participations from Douyu platform in mainland China indicates that active Danmu participation is positively associated with Danmu effectance and live streaming effectance which are influenced by both time-enhanced technical feature (synchronicity) and space-enhanced technical feature (visibility). In addition, the study finds that demographic characteristics, namely education and income, also affect active Danmu participation.
This study aims at constructing a microblog influence prediction model and revealing how the user, time, and content features of microblog entries about public health emergencies affect the influence of microblog entries. Microblog entries about the Ebola outbreak are selected as data sets. The BM25 latent Dirichlet allocation model (LDA-BM25) is used to extract topics from the microblog entries. A microblog influence prediction model is proposed by using the random forest method. Results reveal that the proposed model can predict the influence of microblog entries about public health emergencies with a precision rate reaching 88.8%. The individual features that play a role in the influence of microblog entries, as well as their influence tendencies are also analyzed. The proposed microblog influence prediction model consists of user, time, and content features. It makes up the deficiency that content features are often ignored by other microblog influence prediction models. The roles of the three features in the influence of microblog entries are also discussed.