Adaptive Watermarks: A Concept Drift-based Approach for Predicting Event-Time Progress in Data Streams

dc.contributor.authorAwad, Ahmed
dc.contributor.authorTraub, Jonas
dc.contributor.authorSakr, Sherif
dc.date.accessioned2025-05-06T08:21:33Z
dc.date.available2025-05-06T08:21:33Z
dc.date.issued2019
dc.description.abstractEvent-time based stream processing is concerned with analyzing data with respect to its generation time. In most of the cases, data gets delayed during its journey from the source(s) to the stream processing engine. This is known as late data arrival. Among the different approaches for out-of-order stream processing, low watermarks are proposed to inject special records within data streams, i.e., watermarks. A watermark is a timestamp which indicates that no data with a timestamp older than the water mark should be observed later on. Any element as such is consid ered a late arrival. Watermark generation is usually periodic and heuristic-based. The limitation of such watermark generation strategy is its rigidness regarding the frequency of data arrival as well as the delay that data may encounter. In this paper, we propose an adaptive watermark generation strategy. Our strat egy decides adaptively when to generate watermarks and with what timestamp without a priori adjustment. We treat changes in data arrival frequency and changes in delays as concept drifts in stream data mining. We use an Adaptive Window (ADWIN) as our concept drift sensor for the change in the distribution of arrival rate and delay. We have implemented our approach on top of Apache Flink. We compare our approach with periodic water mark generation using two real-life data sets. Our results show that adaptive watermarks achieve a lower average latency by triggering windows earlier and a lower rate of dropped elements by delaying watermarks when out-of-order data is expected.
dc.identifier.citationAwad A. et al. (2019) “Adaptive watermarks: A concept drift-based approach for predicting event-time progress in data streams,” Advances in Database Technology - EDBT, 2019-March, pp. 622–625.
dc.identifier.doihttps://doi.org/10.5441/002/edbt.2019.71
dc.identifier.issn2367-2005
dc.identifier.urihttps://bspace.buid.ac.ae/handle/1234/2924
dc.language.isoen
dc.publisherOpen proceedings org
dc.relation.ispartofseriesAdvances in Database Technology - EDBTv2019-March (2019): 622-625
dc.titleAdaptive Watermarks: A Concept Drift-based Approach for Predicting Event-Time Progress in Data Streams
dc.typeArticle
Files
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.35 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections