ACM Transactions on

Database Systems (TODS)

Latest Articles

Dependencies for Graphs

This article proposes a class of dependencies for graphs, referred to as graph entity dependencies (GEDs). A GED is defined as a combination of a graph pattern and an attribute dependency. In a uniform format, GEDs can express graph functional dependencies with constant literals to catch inconsistencies, and keys carrying id literals to identify... (more)

Output-Optimal Massively Parallel Algorithms for Similarity Joins

Parallel join algorithms have received much attention in recent years due to the rapid development of massively parallel systems such as MapReduce and... (more)

Inferring Insertion Times and Optimizing Error Penalties in Time-decaying Bloom Filters

Current Bloom Filters tend to ignore Bayesian priors as well as a great deal of useful information they hold, compromising the accuracy of their... (more)

A Survey of Spatial Crowdsourcing

Widespread use of advanced mobile devices has led to the emergence of a new class of crowdsourcing called spatial crowdsourcing. Spatial crowdsourcing advances the potential of a crowd to perform tasks related to real-world scenarios involving physical locations, which were not feasible with conventional crowdsourcing methods. The main feature of... (more)


Updates to the TODS Editorial Board

As of January 1, 2018, five Associate Editors—Walid Aref, Graham Cormode, Gautam Das, Sabrina De Capitani di Vimercati, and Dirk Van Gucht—ended their terms, each having served on the editorial board for six years. Walid, Graham, Gautam, Sabrina, and Dirk have provided very substantial, high-caliber service to the journal and the database community. Also five new Associate Editors have joined the editorial board: Angela Bonifati, Université Claude Bernard Lyon 1, Wolfgang Lehner, TU Dresden, Dan Olteanu, University of Oxford, Evaggelia Pitoura, University of Ioannina, and Bernhard Seeger, University of Marburg. All five are highly regarded scholars in database systems.

Please read more here.

Updates to the TODS Editorial Board

As of January 1, 2017, three Associate Editors, Paolo Ciaccia, Divyakant Agrawal, and Sihem Amer-Yahia, ended their terms, each having served on the editorial board for some six years. In addition, they will stay on until they complete their current loads. We are fortunate that they have donated their time and world-class expertise during these years. Also three new Associate Editors have joined the editorial board: Feifei Li, University of Utah, Kian-Lee Tan, National University of Singapore, and Jeffrey Xu Yu, Chinese University of Hong Kong. All three are highly regarded scholars in database systems. 

Please read more here.

Forthcoming Articles
Verification of Hierarchical Artifact Systems

Data-driven workflows, of which IBM?s Business Artifacts are a prime exponent, have been successfully deployed in practice, adopted in industrial standards, and have spawned a rich body of research in academia, focused primarily on static analysis. The present work represents a significant advance on the problem of artifact verification, by considering a much richer and more realistic model than in previous work, incorporating core elements of IBM?s successful Guard-Stage-Milestone model. In particular, the model features task hierarchy, concurrency, and richer artifact data. It also allows database key and foreign key dependencies, as well as arithmetic constraints. The results show decidability of verification and establish its complexity, making use of novel techniques including a hierarchy of Vector Addition Systems and a variant of quantifier elimination tailored to our context.

Interactive Mapping Specification with Exemplar Tuples

While schema mapping specification is a cumbersome task for data curation specialists, it becomes unfeasible for non-expert users, who are unacquainted with the semantics and languages of the involved transformations. In this paper, we present an interactive framework for schema mapping specification suited for non-expert users. The underlying key intuition is to leverage a few exemplar tuples to infer the underlying mappings and iterate the inference process via simple user interactions under the form of boolean queries on the validity of the initial exemplar tuples. The approaches available so far are mainly assuming pairs of complete universal data examples, which can be solely provided by data curation experts, or are limited to poorly expressive mappings. We present a quasi-lattice based exploration of the space of all possible mappings that satisfy arbitrary user exemplar tuples. Along the exploration, we challenge the user to retain the mappings that fit the users requirements at best and to dynamically prune the exploration space, thus reducing the number of user interactions. We prove that after the refinement process, the obtained mappings are correct and complete. We present an extensive experimental analysis devoted to measure the feasibility of our interactive mapping strategies and the inherent quality of the obtained mappings.

From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms

Lightweight data compression algorithms are frequently applied in in-memory database systems to tackle the growing gap between processor speed and main memory bandwidth. In recent years, the vectorization of basic techniques such as delta coding and null suppression has considerably enlarged the corpus of available algorithms. As a result, today there is a large number of algorithms to choose from, while different algorithms are tailored to different data characteristics. However, a comparative evaluation of these algorithms under different data and hardware characteristics has never been sufficiently conducted in the literature. To close this gap, we conducted an exhaustive experimental survey by evaluating several state-of-the-art compression algorithms as well as cascades of basic techniques. We systematically investigated the influence of data as well as hardware properties on the performance and the compression rates. The evaluated algorithms are based on publicly available implementations as well as our own vectorized reimplementations. We summarize our experimental findings leading to several new insights and to the conclusion, that there is no single-best algorithm. Generally, the decision not only depends on data but also on hardware properties.

A Unified Framework for Frequent Sequence Mining with Subsequence Constraints

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this paper, we show that many subsequence constraints---including and beyond those considered in the literature---can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms---although more general---are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.

All ACM Journals | See Full Journal Index

Search TODS
enter search term and/or author name