Search Results

1 - 5 of 5 items :

  • Databases and Data Mining x
Clear All
Rediscovering Don Swanson:The Past, Present and Future of Literature-based Discovery

1 Introduction Don R. Swanson (1924–2012) was well appreciated during his lifetime as Dean of the Graduate Library School at University of Chicago, as winner of the American Society for Information Science Award of Merit for 2000, and as author of many seminal articles ( Figure 1 ). Don became Emeritus in 1996, but did not truly retire until around 2007, when he suffered a series of strokes. Around 10 years ago, Tanja Bekhuis (2006) wrote a review article that discussed Don’s contributions and their subsequent influence on bioinformatics and text mining

Open access
Topics of Controversy: An Empirical Analysis of Web Censorship Lists


Studies of Internet censorship rely on an experimental technique called probing. From a client within each country under investigation, the experimenter attempts to access network resources that are suspected to be censored, and records what happens. The set of resources to be probed is a crucial, but often neglected, element of the experimental design.

We analyze the content and longevity of 758,191 webpages drawn from 22 different probe lists, of which 15 are alleged to be actual blacklists of censored webpages in particular countries, three were compiled using a priori criteria for selecting pages with an elevated chance of being censored, and four are controls. We find that the lists have very little overlap in terms of specific pages. Mechanically assigning a topic to each page, however, reveals common themes, and suggests that handcurated probe lists may be neglecting certain frequently censored topics. We also find that pages on controversial topics tend to have much shorter lifetimes than pages on uncontroversial topics. Hence, probe lists need to be continuously updated to be useful.

To carry out this analysis, we have developed automated infrastructure for collecting snapshots of webpages, weeding out irrelevant material (e.g. site “boilerplate” and parked domains), translating text, assigning topics, and detecting topic changes. The system scales to hundreds of thousands of pages collected.

Open access
ErasuCrypto: A Light-weight Secure Data Deletion Scheme for Solid State Drives


Securely deleting invalid data from secondary storage is critical to protect users’ data privacy against unauthorized accesses. However, secure deletion is very costly for solid state drives (SSDs), which unlike hard disks do not support in-place update. When applied to SSDs, both erasure-based and cryptography-based secure deletion methods inevitably incur large amount of valid data migrations and/or block erasures, which not only introduce extra latency and energy consumption, but also harm SSD lifetime.

This paper proposes ErasuCrypto, a light-weight secure deletion framework with low block erasure and data migration overhead. ErasuCrypto integrates both erasurebased and encryption-based data deletion methods and flexibly selects the more cost-effective one to securely delete invalid data. We formulate a deletion cost minimization problem and give a greedy heuristic as the starting point. We further show that the problem can be reduced to a maximum-edge biclique finding problem, which can be effectively solved with existing heuristics. Experiments on real-world benchmarks show that ErasuCrypto can reduce the secure deletion cost of erasurebased scheme by 71% and the cost of cryptographybased scheme by 37%, while guaranteeing 100% security by deleting all the invalid data.

Open access
Power to peep-all: Inference Attacks by Malicious Batteries on Mobile Devices


Mobile devices are equipped with increasingly smart batteries designed to provide responsiveness and extended lifetime. However, such smart batteries may present a threat to users’ privacy. We demonstrate that the phone’s power trace sampled from the battery at 1KHz holds enough information to recover a variety of sensitive information.

We show techniques to infer characters typed on a touchscreen; to accurately recover browsing history in an open-world setup; and to reliably detect incoming calls, and the photo shots including their lighting conditions. Combined with a novel exfiltration technique that establishes a covert channel from the battery to a remote server via a web browser, these attacks turn the malicious battery into a stealthy surveillance device.

We deconstruct the attack by analyzing its robustness to sampling rate and execution conditions. To find mitigations we identify the sources of the information leakage exploited by the attack. We discover that the GPU or DRAM power traces alone are sufficient to distinguish between different websites. However, the CPU and power-hungry peripherals such as a touchscreen are the primary sources of fine-grain information leakage. We consider and evaluate possible mitigation mechanisms, highlighting the challenges to defend against the attacks.

In summary, our work shows the feasibility of the malicious battery and motivates further research into system and application-level defenses to fully mitigate this emerging threat.

Open access
#DontTweetThis: Scoring Private Information in Social Networks

, 2009. [71] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems , 10(05):557–570, 2002. [72] H. Takemura and K. Tajima. Tweet classification based on their lifetime duration. In ACM CIKM , 2012. [73] S. Talukder and B. Carbunar. Abusniff: Automatic detection and defenses against abusive facebook friends. In AAAI Conference on Web and Social Media , 2018. [74] Twitter. Api reference index. [75] O. Varol, E. Ferrara, C. A. Davis, F. Menczer, and A. Flam

Open access