Persona is a common human-computer interaction technique for increasing stakeholders’ understanding of audiences, customers, or users. Applied in many domains, such as e-commerce, health, marketing, software development, and system design, personas have remained relatively unchanged for several decades. However, with the increasing popularity of digital user data and data science algorithms, there are new opportunities to progressively shift personas from general representations of user segments to precise interactive tools for decision-making. In this vision, the persona profile functions as an interface to a fully functional analytics system. With this research, we conceptually investigate how data-driven personas can be leveraged as analytics tools for understanding users. We present a conceptual framework consisting of (a) persona benefits, (b) analytics benefits, and (c) decision-making outcomes. We apply this framework for an analysis of digital marketing use cases to demonstrate how data-driven personas can be leveraged in practical situations. We then present a functional overview of an actual data-driven persona system that relies on the concept of data aggregation in which the fundamental question defines the unit of analysis for decision-making. The system provides several functionalities for stakeholders within organizations to address this question.
With the rapid growth of the smartphone and tablet market, mobile application (App) industry that provides a variety of functional devices is also growing at a striking speed. Product life cycle (PLC) theory, which has a long history, has been applied to a great number of industries and products and is widely used in the management domain. In this study, we apply classical PLC theory to mobile Apps on Apple smartphone and tablet devices (Apple App Store). Instead of trying to utilize often-unavailable sales or download volume data, we use open-access App daily download rankings as an indicator to characterize the normalized dynamic market popularity of an App. We also use this ranking information to generate an App life cycle model. By using this model, we compare paid and free Apps from 20 different categories. Our results show that Apps across various categories have different kinds of life cycles and exhibit various unique and unpredictable characteristics. Furthermore, as large-scale heterogeneous data (e.g., user App ratings, App hardware/software requirements, or App version updates) become available and are attached to each target App, an important contribution of this paper is that we perform in-depth studies to explore how such data correlate and affect the App life cycle. Using different regression techniques (i.e., logistic, ordinary least squares, and partial least squares), we built different models to investigate these relationships. The results indicate that some explicit and latent independent variables are more important than others for the characterization of App life cycle. In addition, we find that life cycle analysis for different App categories requires different tailored regression models, confirming that inner-category App life cycles are more predictable and comparable than App life cycles across different categories.
Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.
The paper describes the problem of conversion of heights to the European Vertical Reference Frame 2007 for Poland (PL-EVRF2007-NH). The subject of the study is height data, and especially the detailed vertical reference network. The aim of the article is to present an alternative method of conversion to the one recommended by the Polish Head Office of Geodesy and Cartography. The proposed approach is characterised by a low implementation cost while maintaining the required accuracy.
The publication is illustrated by the case of Kętrzyn district (in the north-east part of Poland). The local reference network was converted from Kronstad’60 to PL-EVRF2007-NH in 2017.
In today’s data-centric economy, data flows are increasingly diverse and complex. This is best exemplified by mobile apps, which are given access to an increasing number of sensitive APIs. Mobile operating systems have attempted to balance the introduction of sensitive APIs with a growing collection of permission settings, which users can grant or deny. The challenge is that the number of settings has become unmanageable. Yet research also shows that existing settings continue to fall short when it comes to accurately capturing people’s privacy preferences. An example is the inability to control mobile app permissions based on the purpose for which an app is requesting access to sensitive data. In short, while users are already overwhelmed, accurately capturing their privacy preferences would require the introduction of an even greater number of settings. A promising approach to mitigating this trade-off lies in using machine learning to generate setting recommendations or bundle some settings. This article is the first of its kind to offer a quantitative assessment of how machine learning can help mitigate this trade-off, focusing on mobile app permissions. Results suggest that it is indeed possible to more accurately capture people’s privacy preferences while also reducing user burden.
Black-box accumulation (BBA) is a building block which enables a privacy-preserving implementation of point collection and redemption, a functionality required in a variety of user-centric applications including loyalty programs, incentive systems, and mobile payments. By definition, BBA+ schemes (Hartung et al. CCS ‘17) offer strong privacy and security guarantees, such as unlinkability of transactions and correctness of the balance flows of all (even malicious) users. Unfortunately, the instantiation of BBA+ presented at CCS ‘17 is, on modern smartphones, just fast enough for comfortable use. It is too slow for wearables, let alone smart-cards. Moreover, it lacks a crucial property: For the sake of efficiency, the user’s balance is presented in the clear when points are deducted. This may allow to track owners by just observing revealed balances, even though privacy is otherwise guaranteed. The authors intentionally forgo the use of costly range proofs, which would remedy this problem.
We present an instantiation of BBA+ with some extensions following a different technical approach which significantly improves efficiency. To this end, we get rid of pairing groups, rely on different zero-knowledge and fast range proofs, along with a slightly modified version of Baldimtsi-Lysyanskaya blind signatures (CCS ‘13). Our prototype implementation with range proofs (for 16 bit balances) outperforms BBA+ without range proofs by a factor of 2.5. Moreover, we give estimates showing that smart-card implementations are within reach.
Encrypting data before sending it to the cloud protects it against attackers, but requires the cloud to compute on encrypted data. Trusted modules, such as SGX enclaves, promise to provide a secure environment in which data can be decrypted and then processed. However, vulnerabilities in the executed program, which becomes part of the trusted code base (TCB), give attackers ample opportunity to execute arbitrary code inside the enclave. This code can modify the dataflow of the program and leak secrets via SGX side-channels. Since any larger code base is rife with vulnerabilities, it is not a good idea to outsource entire programs to SGX enclaves. A secure alternative relying solely on cryptography would be fully homomorphic encryption. However, due to its high computational complexity it is unlikely to be adopted in the near future. Researchers have made several proposals for transforming programs to perform encrypted computations on less powerful encryption schemes. Yet current approaches do not support programs making control-flow decisions based on encrypted data.
We introduce the concept of dataflow authentication (DFAuth) to enable such programs. DFAuth prevents an adversary from arbitrarily deviating from the dataflow of a program. Our technique hence offers protections against the side-channel attacks described above. We implemented DFAuth using a novel authenticated homomorphic encryption scheme, a Java bytecode-tobytecode compiler producing fully executable programs, and an SGX enclave running a small and program-independent TCB. We applied DFAuth to an existing neural network that performs machine learning on sensitive medical data. The transformation yields a neural network with encrypted weights, which can be evaluated on encrypted inputs in 0.86 s.
Apple Continuity protocols are the underlying network component of Apple Continuity services which allow seamless nearby applications such as activity and file transfer, device pairing and sharing a network connection. Those protocols rely on Bluetooth Low Energy (BLE) to exchange information between devices: Apple Continuity messages are embedded in the pay-load of BLE advertisement packets that are periodically broadcasted by devices. Recently, Martin et al. identified  a number of privacy issues associated with Apple Continuity protocols; we show that this was just the tip of the iceberg and that Apple Continuity protocols leak a wide range of personal information.
In this work, we present a thorough reverse engineering of Apple Continuity protocols that we use to uncover a collection of privacy leaks. We introduce new artifacts, including identifiers, counters and battery levels, that can be used for passive tracking, and describe a novel active tracking attack based on Handoff messages. Beyond tracking issues, we shed light on severe privacy flaws. First, in addition to the trivial exposure of device characteristics and status, we found that HomeKit accessories betray human activities in a smarthome. Then, we demonstrate that AirDrop and Nearby Action protocols can be leveraged by passive observers to recover e-mail addresses and phone numbers of users. Finally, we exploit passive observations on the advertising traffic to infer Siri voice commands of a user.