Adaptive Watermarks: A Concept Drift-based Approach for Predicting Event-Time Progress in Data Streams
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Open proceedings org
Abstract
Event-time based stream processing is concerned with analyzing
data with respect to its generation time. In most of the cases, data
gets delayed during its journey from the source(s) to the stream
processing engine. This is known as late data arrival. Among
the different approaches for out-of-order stream processing, low
watermarks are proposed to inject special records within data
streams, i.e., watermarks. A watermark is a timestamp which
indicates that no data with a timestamp older than the water mark should be observed later on. Any element as such is consid ered a late arrival. Watermark generation is usually periodic and
heuristic-based. The limitation of such watermark generation
strategy is its rigidness regarding the frequency of data arrival
as well as the delay that data may encounter. In this paper, we
propose an adaptive watermark generation strategy. Our strat egy decides adaptively when to generate watermarks and with
what timestamp without a priori adjustment. We treat changes
in data arrival frequency and changes in delays as concept drifts
in stream data mining. We use an Adaptive Window (ADWIN)
as our concept drift sensor for the change in the distribution of
arrival rate and delay. We have implemented our approach on top
of Apache Flink. We compare our approach with periodic water mark generation using two real-life data sets. Our results show
that adaptive watermarks achieve a lower average latency by
triggering windows earlier and a lower rate of dropped elements
by delaying watermarks when out-of-order data is expected.
Description
Keywords
Citation
Awad A. et al. (2019) “Adaptive watermarks: A concept drift-based approach for predicting event-time progress in data streams,” Advances in Database Technology - EDBT, 2019-March, pp. 622–625.