Abstract:
Batch is an important pattern in data streams, which refers to a group of identical items that arrive closely. We find that some special batches that arrive periodically ...Show MoreMetadata
Abstract:
Batch is an important pattern in data streams, which refers to a group of identical items that arrive closely. We find that some special batches that arrive periodically are of great value. In this paper, we formally define a new pattern, namely periodic batches. A group of periodic batches refers to several batches of the same item, where these batches arrive periodically. Studying periodic batches is important in many applications, such as caches, financial markets, online advertisements, networks, etc. This paper proposes a unified framework, namely the HyperCalm sketch, to detect batch and periodic batch in data streams. HyperCalm sketch takes two phases to detect periodic batches. In phase 1, we propose a time-aware Bloom filter, called HyperBloomFilter (HyperBF), to detect batches. In phase 2, we propose an enhanced top-k algorithm, called Calm Space-Saving (CalmSS), to report top-k periodic batches. Extensive experiments show HyperCalm outperforms the strawman solutions 4× in term of average relative error and 98.1× in term of speed. All related codes are open-sourced.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 36, Issue: 11, November 2024)
Funding Agency:
References is not available for this document.
Getting results...