Out-of-order event arrival is present in general data stream processing applications. The disorder of tuples within a stream is caused by network latency, operator parallelization, merging of asynchronous streams, etc [1]. There are four main techniques of disorder handling called Buffer-based, Punctuation-based, Speculation-based, and Approximation-based techniques.
Buffer-based techniques use a buffer to sort tuples from the input stream before presenting them to the query operator. K-Slack [2] and AQ-K-slack [1] are two example techniques for Buffer-based disorder handling.
Punctuation-based techniques [3] depend on special tuples sent with data streams. Punctuations explicitly informs a query operator when to return results for windows. Hence unlike Buffer-based techniques, query operator can consume out-of-order input directly.
Speculation-based techniques assume in-order arrival of tuples and produce the results of a window immediately when the window is closed. When a late arrival e is detected, previously emitted results which are affected by e are invalidated. New revisions of these results are produced by taking e into account.
Approximation-based techniques [4] summarize the raw data stream with a special data structure (histograms, q-digests) and produce approximate aggregate results based on these summaries.
References
[1] Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer (2015), Quality-driven processing of sliding window aggregates over out-of-order data streams. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15). ACM, New York, NY, USA, 68-79.
[2] Christopher Mutschler, Michael Philippsen (2014), Adaptive Speculative Processing of Out-of-Order Event Streams. ACM Trans. Internet Techn. 14(1): 4:1-4:24
[3] Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '04). ACM, New York, NY, USA, 263-274.
[4] Graham Cormode, Flip Korn, and Srikanta Tirthapura. 2008. Time-decaying aggregates in out-of-order streams. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '08). ACM, New York, NY, USA, 89-98.
Buffer-based techniques use a buffer to sort tuples from the input stream before presenting them to the query operator. K-Slack [2] and AQ-K-slack [1] are two example techniques for Buffer-based disorder handling.
Punctuation-based techniques [3] depend on special tuples sent with data streams. Punctuations explicitly informs a query operator when to return results for windows. Hence unlike Buffer-based techniques, query operator can consume out-of-order input directly.
Speculation-based techniques assume in-order arrival of tuples and produce the results of a window immediately when the window is closed. When a late arrival e is detected, previously emitted results which are affected by e are invalidated. New revisions of these results are produced by taking e into account.
Approximation-based techniques [4] summarize the raw data stream with a special data structure (histograms, q-digests) and produce approximate aggregate results based on these summaries.
References
[1] Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer (2015), Quality-driven processing of sliding window aggregates over out-of-order data streams. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15). ACM, New York, NY, USA, 68-79.
[2] Christopher Mutschler, Michael Philippsen (2014), Adaptive Speculative Processing of Out-of-Order Event Streams. ACM Trans. Internet Techn. 14(1): 4:1-4:24
[3] Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '04). ACM, New York, NY, USA, 263-274.
[4] Graham Cormode, Flip Korn, and Srikanta Tirthapura. 2008. Time-decaying aggregates in out-of-order streams. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '08). ACM, New York, NY, USA, 89-98.
No comments:
Post a Comment