Thursday, October 1, 2015

An overview of Out-of-Order Event Processing

Out-of-order event arrival is present in general data stream processing applications. The disorder of tuples within a stream is caused by network latency, operator parallelization, merging of asynchronous streams, etc [1]. There are four main techniques of disorder handling called Buffer-based, Punctuation-based, Speculation-based, and Approximation-based techniques.

Buffer-based techniques use a buffer to sort tuples from the input stream before presenting them to the query operator. K-Slack [2] and AQ-K-slack [1] are two example techniques for Buffer-based disorder handling.

Punctuation-based techniques [3] depend on special tuples sent with data streams. Punctuations explicitly informs a query operator when to return results for windows. Hence unlike Buffer-based techniques, query operator can consume out-of-order input directly.

Speculation-based techniques assume in-order arrival of tuples and produce the results of a window immediately when the window is closed. When a late arrival e is detected, previously emitted results which are affected by e are invalidated. New revisions of these results are produced by taking e into account.

Approximation-based techniques [4] summarize the raw data stream with a special data structure (histograms, q-digests) and produce approximate aggregate results based on these summaries.



References

[1] Yuanzhen Ji, Hongjin Zhou, Zbigniew Jerzak, Anisoara Nica, Gregor Hackenbroich, and Christof Fetzer (2015), Quality-driven processing of sliding window aggregates over out-of-order data streams. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15). ACM, New York, NY, USA, 68-79.

[2] Christopher Mutschler, Michael Philippsen (2014), Adaptive Speculative Processing of Out-of-Order Event Streams. ACM Trans. Internet Techn. 14(1): 4:1-4:24

[3] Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '04). ACM, New York, NY, USA, 263-274.

[4] Graham Cormode, Flip Korn, and Srikanta Tirthapura. 2008. Time-decaying aggregates in out-of-order streams. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '08). ACM, New York, NY, USA, 89-98.










Recent Complex Event Processing Trends

This article is the first in a series of articles which I will be writing on advancements made on Complex Event Processing (CEP) in recent times.

A complex event is an event derived from a group of events using either aggregation of derivation functions [1]. It is an event that summarizes, represents, or denotes a set of other events. CEP can be defined as computing which performs operations on complex events, including reading, creating, transforming, abstracting, or discarding them [2].


Figure 1: An abstract view of CEP.

A high-level view of a complex event processing application can be shown as in Figure 1 [3]. Events of different formats are gathered from different event producers. The event producers can be of various different types including financial feeds, news feeds, weather sensors, application logs, video streams collected from surveillance cameras, etc. The CEP engine is the brain of the processing which does multiple types of processing on event streams based on predefined rules. The processing includes simple counting, averaging, median calculation type of simple event processing operations as well as more complex processing such as pattern matching, event prediction (forecasting), etc. Event consumers are parties who are interested of mining valuable information from the event streams. Examples for consumers include software agents, users of web/mobile applications, etc.

One way of categorizing the entire field of CEP is by using three areas as CEP use cases, CEP system architectures, and CEP open research topics. Multiple new use cases have appeared such as Streaming Machine Learning (Streaming ML), Internet of Things (IoT), applications with complex data types such as text, videos, graphs, etc. Finally, there are a significant number of open research topics and challenges to be concurred in CEP such as methods for querying streaming data, CEP benchmarking, Out-of-order event processing, etc. to name a few.

References

[1] Opher Etzion. 2009. Complex Event. In Encyclopedia of Database Systems, LING LIU and M.TAMER ZSU (Eds.). Springer US, 411–412.

[2] W. R. Schulte. 2015. CEP Technology: EPPs, DSCPs and other Product Categories. URL: http://www.complexevents.com/2015/07/03/cep-technology-epps-dscps-and-other-product-categories/. (2015).

[3] Gianpaolo Cugola and Alessandro Margara. 2012a. Complex Event Processing with T-REX. J. Syst. Softw. 85, 8 (Aug. 2012),
1709–1728.