Efficient, Adaptable, and Scalable Synopses for Data-Intensive Systems
Licentiate thesis in Computer Science and Engineering at Chalmers University of Technology. This page presents the research scope, included publications, and other information. The complete thesis is available to download here.
Research Overview
Data-intensive systems generate data at rates and volumes that demand timely single-pass analysis with bounded memory. Computing exact statistics in a single pass often requires memory that grows with the data; this is manageable for small windows, but becomes infeasible as the scope or number of computations grows. Data summarization, or synopses, addresses this by capturing statistics in sublinear space with bounded error, trading accuracy for memory and throughput.
Although synopses are well-established algorithmically, practical use depends on hardware and deployment details. Most synopsis algorithms are sequential, yet production data rates often exceed a single core's capacity; throughput and latency also depend on cache behavior and access patterns, not just asymptotic complexity. Furthermore, they are initialized at fixed settings and approximation bounds, and cannot adapt to changing budgets and workloads. Multi-core and distributed processors can absorb higher rates, but they bring contention, cache coherence costs, state migration when scaling, and require merging differently-sized summaries.
Targeting these challenges, this thesis studies two core summarization primitives, heavy-hitter detection and frequency estimation, along three complementary dimensions:
- Efficient — maximizing throughput, accuracy, and functionality per unit of resource through hardware-aware, cache-friendly designs and reduced per-item instruction cost.
- Adaptable — adjusting memory at runtime, adapting to changing budgets and workloads without restarting, and tracking how approximation bounds evolve through operations.
- Scalable — leveraging multi-core concurrency for single-node performance, and enabling elastic scaling in distributed settings through state partitioning and heterogeneous merging.
[CuckooHeavyKeeper] Efficient + Scalable Accurate, Space-Efficient, and Parallel Heavy-Hitter Detection
This paper analyzes the trade-offs among throughput, memory usage, and accuracy in heavy-hitter detection algorithms. The insights led to two main contributions:
1. Cuckoo Heavy Keeper (CHK) — A heavy-hitter detection algorithm that introduces a process for distinguishing frequent from infrequent items that unlocks synergies inaccessible to conventional approaches, such as reduced per-item instruction cost and improved cache behavior, delivering orders of magnitude better throughput and accuracy compared to state-of-the-art methods.
2. multi-CHK (mCHK) Framework — A categorization of parallelization approaches and a flexible framework that can parallelize any sequential heavy-hitter algorithm, with support for concurrent updates and queries. The framework offers two optimization variants (insertion-optimized mCHK-I and query-optimized mCHK-Q), achieves near-linear scaling with thread count, and processes billions of updates per second with very low query latency (<150 μsec) at a modest 2.1 GHz clock rate.
[ReSketch] Efficient + Adaptable + Scalable Resizable, Mergeable, and Partitionable Frequency Estimation
This paper identifies three properties that target the above challenges: resizability (adjusting memory at runtime), enhanced mergeability (combining differently-sized summaries), and partitionability (splitting state for elastic scaling and load rebalancing). Building on these, it introduces two main contributions:
1. ReSketch — A frequency estimation sketch design that achieves all three properties while maintaining a beneficial memory-to-accuracy ratio:
- Resizability — dynamically expand or shrink the sketch size at fine granularity while maintaining accuracy and throughput comparable to a sketch initialized at the target size.
- Enhanced Mergeability — combine sketches of different sizes and control the output size to preserve as much precision as the input sketches collectively offer.
- Partitionability — partition the sketch state itself (not just the input) to enable both parallel processing on multi-core systems and dynamic redistribution of monitoring responsibilities in distributed settings, while preserving historical information.
2. Instance Provenance DAG — A formal framework that tracks how approximation bounds evolve through arbitrary sequences of resize, merge, and partition operations.
Included Publications
2 papersCuckoo Heavy Keeper and the Balancing Act of Maintaining Heavy Hitters in Stream Processing
Ngo, V.Q., Papatriantafilou, M.
- ▶ Reduced per-item instruction cost and improved cache behavior
- ▶ Billion-scale updates per second on commodity multi-core hardware
- ▶ Near-linear scaling with thread count
- ▶ Sub-150 μs query latency
ReSketch: A Mergeable, Partitionable, and Resizable Sketch
Ngo, V.Q., Hilgendorf, M., Papatriantafilou, M.
- ▶ First sketch supporting all three: resize, merge, and partition
- ▶ Fine-granularity dynamic resizing with accuracy comparable to static initialization
- ▶ Merge sketches of different sizes with controllable output precision
- ▶ Partition sketch state itself to support live load rebalancing
Other Publications
not includedCLUE — Clustering-Based Load Understanding and Exploration
Magnusson, L., Thorsson, R., Ngo, V., Papatriantafilou, M., van Rooij, J., & Chigrichenko, M.
STERR-GAN: Spatio-temporal Re-rendering for Facial Video Restoration
Ngo-Huu, M.K., Ngo, V., Luu, D.T., Pham, B.N., Nguyen, V.T.