Vinh Ngo - PhD Portfolio

#	Item	True	Est.	Error
Insert items to see comparison

How It Works

Use the interactive visualization above to explore the ReSketch algorithm hands-on. Insert individual items or use Batch Mode to generate data streams. Query items to see frequency estimates via the median-of-rows approach. Use the Expand and Shrink buttons to dynamically resize the sketch and watch how items are remapped across buckets using consistent hashing.

Algorithm Breakdown

ReSketch is a frequency estimation data structure designed for distributed and parallel environments. Unlike traditional sketches (e.g., Count-Min Sketch), ReSketch supports structural operations, i.e., expand, shrink, merge, and split, that allow the sketch to adapt its memory footprint at runtime without losing accuracy.

Consistent Hashing Ring

Each row of the sketch uses a consistent hashing ring, a sorted list of random hash points mapped to bucket IDs. When an item is inserted, its hash is placed on the ring and assigned to the nearest bucket. This design enables seamless resizing: adding or removing ring points only redistributes a fraction of items.

KLL Quantile Sketch

Each bucket contains a KLL sketch (a compactor-based quantile summary). Instead of storing raw counts, the KLL tracks the distribution of hash values that land in the bucket. To estimate an item's frequency, we query the KLL for how many times that specific hash value appears.

Multi-Row Median Estimation

The sketch maintains multiple independent rows (depth), each with its own hash function and ring. The final frequency estimate is the median across all rows, which provides robustness against hash collisions, which is similar to how Count-Min Sketch uses the minimum across rows.

Expand & Shrink

Expand adds new hash points to the ring, creating new buckets. Items from existing buckets are redistributed to the new buckets using the KLL's range-query capability. Shrink removes hash points and merges the affected items back into neighboring buckets. Both operations preserve the statistical properties of the KLL sketches.

Structural Operations

Expand

When expanding from width w to w':

New random hash points are added to each ring
For each new bucket boundary, the KLL sketch of the overlapping old bucket is split
Items in the split range are redistributed to the new bucket
The ring is re-sorted to maintain the consistent hashing invariant

Shrink

When shrinking from width w to w':

Hash points are randomly selected for removal from each ring
Items from removed buckets are merged into their neighboring buckets
KLL sketches are merged to combine the frequency information
Bucket IDs are reassigned sequentially

Implementation Notes

This visualization uses MurmurHash3 (32-bit) for all hashing. The consistent hashing ring operates in the 32-bit hash space. The KLL sketch uses a compaction factor of c = 2/3 and supports weighted item insertion for accurate redistribution during structural operations.

ReSketch

Technical Implementation

Core Logic